Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: Re: pine, mutt, Chinese, Japanese
- To: tlug@example.com
- Subject: Re: tlug: Re: pine, mutt, Chinese, Japanese
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Wed, 4 Aug 1999 14:26:13 +0900 (JST)
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <Pine.LNX.3.95.990725134553.263B-100000@example.com>
- References: <Pine.LNX.4.05.9907250938270.4720-100000@example.com><Pine.LNX.3.95.990725134553.263B-100000@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
>>>>> "jdb" == J David Beutel <jdb@example.com> writes: jdb> On Sun, 25 Jul 1999, Tony Laszlo wrote: >> Setting Unicode aside for the moment, is there any _single_ >> Japanese encoding that has been suggested to take the place of >> euc, jis and sjis? Seems like a needless hassle having to >> convert between two or three and make sure that software can >> display all of the three. Talk to Microsoft and Apple about SJIS. Don't bet on getting a sensible reply; Microsoft uses Unicode internally but doesn't provide any software (well, Word-2000 is supposed to) to handle that format, instead making SJIS the default. JIS is used almost exclusively in messaging applications---mail and netnews---in a rather usable variant of ISO-2022. Due to rules in RFC-822, it is unlikely that 7-bit encodings in mail headers will go away soon. So you'll keep seeing `=?iso-2022-jp?B?...' in raw mail headers for a while. Then you'll start seeing '=?utf-7?B?' or so.... EUC-JP is a rather efficient and simple encoding for Japanese only, so it makes sense to use it for file systems. (Although if you compress the files, the advantage over ISO-2022-JP for most files will almost completely go away. Almost all the kanji-in/kanji-out sequences will be treated as part of the newline sequence, and everything else is a 1-1 map.) The 7/8-bit (JIS/EUC) thing affects Chinese and Korean, too, I believe. jdb> Unicode is exactly it. Why set it aside? I doubt there is jdb> any other. Well, no, Unicode is not exactly it. UCS (ISO-10646) is. Unicode is just a 99.44% accurate approximation. ;-) Unicode is going to require a certain amount of implementation of infrastructure. The problem is that Unicode does not preserve collating orders and the like for anything except American English (and maybe British English). So sorts are going to have to be table-driven. This is actually a good thing; JIS order isn't really all that interesting. It would make it very easy to specify a sort like "kyouiku kanji by year, first, then jouyou kanji, then other Japanese kanji, then non-Japanese kanji, then other characters" by writing appropriate tables. (Not to mention "unifying" zen and hankaku romaji, etc.) But that's very inefficient. So a good general-purpose UCS text sorter is going to need to preprocess a text to be sorted so that characters are in collation order, not in UCS order. That's going to take a while to shake out; there will be lots of reimplementations due to NIH-itis, most of them buggy, many developers will be too lazy, etc. And there are gonna be lots of gotchas. For example, what does `[a-z]' mean in a regexp? Well, presumably it changes according to the language; normally I can't see it including `1' but surely in es_ES locales it will include enye. But everybody has their own favorite flavor of regexp; I bet hardly anybody uses the standard C library versions for languages like Perl, and so on. More reimplementations.... As for "no other", the answer is (according to rumor), unfortunately, "not yet". Evidently JIS is working on a unification of JIS X 0208, JIS X 0212, and JIS X 0213. Presumably it's mostly going to be a regularization and slight tweaking of the familiar sets, but no, they're not planning on going to UCS in any form as a Japanese national standard any time soon. Only the US can really do this, since all pure ASCII documents are already encoded in UTF-8 ;-) -- University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 __________________________________________________________________________ __________________________________________________________________________ What are those two straight lines for? "Free software rules." ------------------------------------------------------------------- Next Technical Meeting: August 14 (Sat), 13:00 place: Temple Univ. *** Special guest: Marc Christensen (Salt Lake Linux Users Group) Next Nomikai: September 20 (Fri), 19:30 Tengu TokyoEkiMae 03-3275-3691 ------------------------------------------------------------------- more info: http://www.tlug.gr.jp Sponsor: Global Online Japan
Home | Main Index | Thread Index
- Prev by Date: tlug: J-Mutt build errors
- Next by Date: tlug: What decides Japanese file name encoding?
- Prev by thread: tlug: J-Mutt build errors
- Next by thread: tlug: What decides Japanese file name encoding?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links