Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tlug: EUC-JP, iso-2022-jp, Shift-Jis



Hi,
I've done a few searches and found some information from 
 http://www.ntt.co.jp/japan/note-on-JP/encoding.html

I've also looked at Ken Lunde's book and confirmed that
JIS is a series of 7-bit 2-byte sequences.  I also
found reference that iso-2022-jp is sometimes referred
to as old-jis.  

This is a bit confusing, because I think that iso-2022 (without the
jp suffix) is referred to as EUC.

>From the NTT page on the constitution of Japan
http://www.ntt.co.jp/japan/constitution/index.html

   Japanese version (coded in ISO-2022-JP known as "Old-JIS", see RFC-1468) 
   Japanese version (coded in EUC) 


I've also just read RFC-1486, authored by Jun Murai.  

As far as I can tell, iso-2022-jp is JIS.  Please some knowledgable
person, send me your comments.

Regards,
Craig


-----------

Japanese Encoding Methods

First of all, documents in Japanese need at least two character sets, ASCII
and JIS X 0208. The latter is a 2-byte character set including Kanji,
Hiragana, Katakana and some other symbols and characters.

We use two Japanese encoding methods in this server. One is EUC-JP (Extended
Unix Code) and the other is ISO-2022-JP.

EUC-JP

EUC-JP is ISO-2022 compliant 8-bit encoding for which initially designated
ASCII to G0 and JIS X 0208-1983 (or JIS X 0208-1990) to G1 without explicit
announcement. G2 and G3 are never used. A sample file encoded in EUC-JP is
here.

ISO-2022-JP

ISO-2022-JP, which is registered as MIME charset name, is a widely used
encoding in Japanese IP communities for electronic mail and network news
messages. It is ISO-2022 compliant 7-bit encoding for which using only G0
codeset. ASCII is initially designated to G0. To switch character sets, you
should designate it to G0 by escape sequences, for example:

        ESC ( B    ASCII
        ESC ( J    JIS X 0201-1976 ("Roman" set)
        ESC $ @    JIS X 0208-1978
        ESC $ B    JIS X 0208-1983

A sample file is here. For more detail about ISO-2022-JP, see RFC-1468.

Although I think ISO-2022-JP is better than EUC-JP, ISO-2022-JP causes some
problems in HTML.

Shift-JIS

There is another encoding scheme for Japanese called Shift-JIS (also called
MS-Kanji Code). Unfortunately, Shift-JIS is widely used with MS-DOS, Windows
and Macintosh; but I think Shift-JIS is rubbish and should not be used
anymore!!! We never use this encoding under this server except this example.

________________________________________________________________________
TAKADA Toshihiro


---------------------------------------------------------------
TLUG Meeting Dec. 13, 12:30 at Tokyo station Yaesu Chuo ticket gate
13:30 Starbuck's coffee.  13:45 HSBC | info: joem@example.com
At least 3 functional Sparc IPC machines will be raffled out
---------------------------------------------------------------
a word from the sponsor:
TWICS - Japan's First Public-Access Internet System
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links