Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]tlug: EUC-JP, iso-2022-jp, Shift-Jis
- To: tlug@example.com
- Subject: tlug: EUC-JP, iso-2022-jp, Shift-Jis
- From: Craig Oda <craig@example.com>
- Date: Mon, 17 Nov 1997 13:07:01 +0900 (JST)
- Content-Type: TEXT/PLAIN; charset=US-ASCII
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
Hi, I've done a few searches and found some information from http://www.ntt.co.jp/japan/note-on-JP/encoding.html I've also looked at Ken Lunde's book and confirmed that JIS is a series of 7-bit 2-byte sequences. I also found reference that iso-2022-jp is sometimes referred to as old-jis. This is a bit confusing, because I think that iso-2022 (without the jp suffix) is referred to as EUC. >From the NTT page on the constitution of Japan http://www.ntt.co.jp/japan/constitution/index.html Japanese version (coded in ISO-2022-JP known as "Old-JIS", see RFC-1468) Japanese version (coded in EUC) I've also just read RFC-1486, authored by Jun Murai. As far as I can tell, iso-2022-jp is JIS. Please some knowledgable person, send me your comments. Regards, Craig ----------- Japanese Encoding Methods First of all, documents in Japanese need at least two character sets, ASCII and JIS X 0208. The latter is a 2-byte character set including Kanji, Hiragana, Katakana and some other symbols and characters. We use two Japanese encoding methods in this server. One is EUC-JP (Extended Unix Code) and the other is ISO-2022-JP. EUC-JP EUC-JP is ISO-2022 compliant 8-bit encoding for which initially designated ASCII to G0 and JIS X 0208-1983 (or JIS X 0208-1990) to G1 without explicit announcement. G2 and G3 are never used. A sample file encoded in EUC-JP is here. ISO-2022-JP ISO-2022-JP, which is registered as MIME charset name, is a widely used encoding in Japanese IP communities for electronic mail and network news messages. It is ISO-2022 compliant 7-bit encoding for which using only G0 codeset. ASCII is initially designated to G0. To switch character sets, you should designate it to G0 by escape sequences, for example: ESC ( B ASCII ESC ( J JIS X 0201-1976 ("Roman" set) ESC $ @ JIS X 0208-1978 ESC $ B JIS X 0208-1983 A sample file is here. For more detail about ISO-2022-JP, see RFC-1468. Although I think ISO-2022-JP is better than EUC-JP, ISO-2022-JP causes some problems in HTML. Shift-JIS There is another encoding scheme for Japanese called Shift-JIS (also called MS-Kanji Code). Unfortunately, Shift-JIS is widely used with MS-DOS, Windows and Macintosh; but I think Shift-JIS is rubbish and should not be used anymore!!! We never use this encoding under this server except this example. ________________________________________________________________________ TAKADA Toshihiro --------------------------------------------------------------- TLUG Meeting Dec. 13, 12:30 at Tokyo station Yaesu Chuo ticket gate 13:30 Starbuck's coffee. 13:45 HSBC | info: joem@example.com At least 3 functional Sparc IPC machines will be raffled out --------------------------------------------------------------- a word from the sponsor: TWICS - Japan's First Public-Access Internet System www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- Follow-Ups:
- tlug: EUC-JP, iso-2022-jp, Shift-Jis
- From: "Stephen J. Turnbull" <turnbull@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: tlug: Re: Linux-Nihongo Doc availability
- Next by Date: Re: tlug: Getting XFree 3.3.1
- Prev by thread: Re: tlug: Re: Linux-Nihongo Doc availability
- Next by thread: tlug: EUC-JP, iso-2022-jp, Shift-Jis
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links