Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: Updating iconv tables



2008/6/18 Stephen J. Turnbull <stephen@example.com>:
> Jim Breen writes:
>
>  > OK, Bruno has replied pointing out "EUC-JISX0213" is what should be used
>  > as the iconv indicator for JIS X 0213 codepoints.
>
> Gag me.  Is 0213 a superset of both 0208 and 0212, so that just
> specifying EUC-JISX0213 captures everything?

Sadly, no. It has  all of JIS X 0208, and adds 3,625 extra kanji and
a heaps of special characters. Most but not all the additions are in
JIS X 0212, for example all but 952 of the additional kanji are/were
in JIS X 0212. (And of those 952, 303 were new to Unicode and went into
the Extensions B.)

> And if it's upward
> compatible, you'd think that EUC-JP would be aliased to
> EUC-JISX-whatever-is-most-comprehensive, no?

Round-trip problems. You can devise a hybrid EUC-JP coding which
support all 3 JIS Xs, but there'll be multiple representations
of many characters. It could be used for going EUC->Unicode, but
not back the other way, For example, for �@ you would not know
whether it would be 0-16-01 (JIS X 0212) or 2-01-02 (JIS X 0213).

> Why the Japanese insisted on created X0212 and X0213, instead of
> amending X0208, is beyond me.

Well, JIS X 0212 was compiled in the late 1980s under the
assumption that the IT indsutry would simply run with it in
some suitable encapsulation. In fact the only encapsulation that
really supports it is EUC-JP. It turned out to be a dodo for
two reasons:

- Shift_JIS had used up almost all the available code-points by
keeping captibility with the old single byte 半角カナ. There was
simply no room for anything like the 6,000+ extra characters. And
the main people using Shift_JIS (i.e. Microsoft) were certainly not
going to inverst in a 3-byte version.

- no-one really wanted to put investment into handling an additional
heap of characters which were only of interest to a few people.

JIS X 0213 was effectively a rework, but cunningly designed so the
most important of the extra characters were shoe-horned into the
crevices of JIS X 0208 not already purloined by Shift_JIS for other
things. It can be validly regarded as an extended JIS X 0208.

IMNSHO JIS X 0213 won't see much light of day anyway. It's main role
(I believe) will be to get some extra kanji, etc. into Unicode.

Here endeth the lesson.....

Jim

-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links