Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: Need info. about Japanese and Linux
- To: tlug@example.com
- Subject: Re: tlug: Need info. about Japanese and Linux
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Fri, 6 Nov 1998 08:31:25 +0900 (JST)
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <3641A6D2.9A0A9547@example.com>
- References: <199811042321.IAA03708@example.com><3641A6D2.9A0A9547@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
>>>>> "Fredric" == Fredric Fredricson <Fredric.Fredriksson@example.com> writes: Fredric> The customer does not know anything about this. He buys a Fredric> machine does not care about these details. Bingo! ++Steve. :-) >> There are three codes in Japanese characters. They are JIS, >> SJIS and EUC. To convert codes there's a converter called >> 'nkf'. Fredric> Are these codes 8-bit? My concern is if I can fit it Fredric> inside our current system of language-specific text Fredric> files. Well, strictly speaking they are 16-bit---someone who has finished first grade in Japan has a repertoire of at least 225 characters already, and most about 350---can't fit that into 8-bits. Let alone an educated adult's repertoire of about 10,000. Technically speaking, the most common "native" encoding for Japanese is "Packed EUC" which is an ISO-2022 conformant 8-bit code with JIS X 0201 Roman alphabet (for your purposes, ASCII +/- 2 or 3 characters) invoked to GL/G0 and JIS X 0208 invoked to GR/G1. Normally it uses no shift sequences, although auxiliary character sets can be invoked to G2 and G3. It's unlikely you would need those extra character sets unless you are doing entry of personal and place names. Commonly used in messaging applications like mail and netnews is ISO-2022-JP, which is an ISO-2022 conformant 7-bit code, using shift sequences (ESC "$B" to designate and shift JIS X 0208 into G0/GL, and ESC "(B" to designate and shift ASCII into G0/GL). This has some other restrictions which are unimportant for your immediate purpose of determining compatibility (eg, G0 is initialized to ASCII, each line of the data stream must end in ASCII (before the newline), etc). Commonly used on MS Windows and Macintosh is "Shift JIS." Often the "f" is omitted, to indicate that this code is an 8-bit code that doesn't comply with anything except Microsoft's whims and will pollute any data channel that transmits it. You have to accept it in general applications (there are too many MS systems out there), but you should never produce it or store it internally. (MS systems can all handle both Packed EUC and ISO-2022-JP now, interchange is not a concern.) Effectively never used is Unicode. Unicode conforms to ISO-10646, of course (and adds many further restrictions), but suffers from issues of user preference (many Japanese personal and place names cannot be encoded in Unicode) and programming awkwardness (the collating order of the Japanese national standard JIS X 0208 differs from that of Unicode). I doubt that you would have a problem dealing with the programming issue since it's already present when using ISO-8859-1, although you might have to construct or at least improve the necessary POSIX locale(s) (I haven't looked carefully in some months, but last I looked the Japanese locales were pretty weakly implemented in glibc, and certainly few Japanese programs use the POSIX locale model). HTH. -- University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 __________________________________________________________________________ __________________________________________________________________________ What are those two straight lines for? "Free software rules." ---------------------------------------------------------------- Next Nomikai: 20 November, 19:30 Tengu TokyoEkiMae 03-3275-3691 Next Technical Meeting: 12 December, 12:30 HSBC Securities Office ---------------------------------------------------------------- more info: http://tlug.linux.or.jp Sponsors: PHT, HSBC Securities
- References:
- Re: tlug: Need info. about Japanese and Linux
- From: Uchida.Masatomo@example.com, <masatomo@example.com>
- Re: tlug: Need info. about Japanese and Linux
- From: Fredric Fredricson <Fredric.Fredriksson@example.com>
Home | Main Index | Thread Index
- Prev by Date: tlug: Partition Table Values
- Next by Date: Re: tlug: Need info. about Japanese and Linux
- Prev by thread: Re: tlug: Need info. about Japanese and Linux
- Next by thread: My Face Is Red: Re: tlug: Need info. about Japanese and Linux
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links