Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Why is Shift_JIS bad?



I agree with all the comments made so far about Shift JIS. Some, e.g.
sign-extension in char apply equally well to EUC and UTF8. (FWIW I always
handle EUC/Shift_JIS/etc. as "unsigned char").

My two gripes about Shift_JIS are:

(a) wasted code space. By making room for the JIS 201 hankaku kana, a large
proportion of the 2**14 code space is effectively wasted. That's why usage of
JIS212 never got anywhere, and why JIS213 has been designed to squeeze in.

(Having said that, this is a fairly subtle point as it only concerns a few
people who want arcane kanji. Also Unicode is blowing this problem away.)

(b) it is trickier to handle internally. With EUC you can do all sorts of
single-character activities (index, rindex, strchr, etc.) with total safety.
Also you can scan backwards up strings being abble to detect reliably that
you are in a Japanese character. With Shift_Jis both are much trickier.

The ONLY advantage of ShiftJIS is that hankakukana is carried as a single
byte rather than two bytes as in EUC. Conversion between EUC and Shift_JIS at
the string-level is close to trivial.

Jim


-- 
Jim Breen  (j.breen@example.com  http://www.csse.monash.edu.au/~jwb/)
Computer Science & Software Engineering,                Tel: +61 3 9905 3298
P.O Box 26, Monash University,                          Fax: +61 3 9905 5146
Clayton VIC 3800, Australia      ジム・ブリーン@モナシュ大学

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links