Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Re: Updating iconv tables
- Date: Wed, 18 Jun 2008 14:15:15 +1000
- From: "Jim Breen" <jimbreen@example.com>
- Subject: Re: [tlug] Re: Updating iconv tables
- References: <5634e9210806102023p448a36bcw2d90f138cebb5597@mail.gmail.com> <5634e9210806121550p31912022u6c5b4d9ef73c61be@mail.gmail.com> <5634e9210806121716j746cbedaod245b80871298245@mail.gmail.com> <87y753reot.fsf@uwakimon.sk.tsukuba.ac.jp>
2008/6/18 Stephen J. Turnbull <stephen@example.com>: > Jim Breen writes: > > > OK, Bruno has replied pointing out "EUC-JISX0213" is what should be used > > as the iconv indicator for JIS X 0213 codepoints. > > Gag me. Is 0213 a superset of both 0208 and 0212, so that just > specifying EUC-JISX0213 captures everything? Sadly, no. It has all of JIS X 0208, and adds 3,625 extra kanji and a heaps of special characters. Most but not all the additions are in JIS X 0212, for example all but 952 of the additional kanji are/were in JIS X 0212. (And of those 952, 303 were new to Unicode and went into the Extensions B.) > And if it's upward > compatible, you'd think that EUC-JP would be aliased to > EUC-JISX-whatever-is-most-comprehensive, no? Round-trip problems. You can devise a hybrid EUC-JP coding which support all 3 JIS Xs, but there'll be multiple representations of many characters. It could be used for going EUC->Unicode, but not back the other way, For example, for �@ you would not know whether it would be 0-16-01 (JIS X 0212) or 2-01-02 (JIS X 0213). > Why the Japanese insisted on created X0212 and X0213, instead of > amending X0208, is beyond me. Well, JIS X 0212 was compiled in the late 1980s under the assumption that the IT indsutry would simply run with it in some suitable encapsulation. In fact the only encapsulation that really supports it is EUC-JP. It turned out to be a dodo for two reasons: - Shift_JIS had used up almost all the available code-points by keeping captibility with the old single byte 半角カナ. There was simply no room for anything like the 6,000+ extra characters. And the main people using Shift_JIS (i.e. Microsoft) were certainly not going to inverst in a 3-byte version. - no-one really wanted to put investment into handling an additional heap of characters which were only of interest to a few people. JIS X 0213 was effectively a rework, but cunningly designed so the most important of the extra characters were shoe-horned into the crevices of JIS X 0208 not already purloined by Shift_JIS for other things. It can be validly regarded as an extended JIS X 0208. IMNSHO JIS X 0213 won't see much light of day anyway. It's main role (I believe) will be to get some extra kanji, etc. into Unicode. Here endeth the lesson..... Jim -- Jim Breen Honorary Senior Research Fellow Clayton School of Information Technology, Monash University, VIC 3800, Australia http://www.csse.monash.edu.au/~jwb/
- References:
- [tlug] Updating iconv tables
- From: Jim Breen
- [tlug] Re: Updating iconv tables
- From: Jim Breen
- [tlug] Re: Updating iconv tables
- From: Jim Breen
- [tlug] Re: Updating iconv tables
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] RMS is at it again
- Next by Date: [tlug] First FF3.0 bug? F6 doesn't work
- Previous by thread: [tlug] Re: Updating iconv tables
- Next by thread: [tlug] [OT] "Gone Viral in a Shameless World"
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links