Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]RE: [tlug] Why is Shift_JIS bad?
- Date: Fri, 30 Aug 2002 13:24:30 +0900
- From: Jim BLACKSON <blackson@example.com>
- Subject: RE: [tlug] Why is Shift_JIS bad?
Two other (in)famous Shift-JIS problems are: 1. C compiler: char as signed byte. The char type is often a single byte signed integer. The sign often gets extended into the most significant byte when converting to a word integer on Intel machines. That is, byte 0x82 becomes word 0xFF82. When doing bit-wise logical operations (and, or), you must careful to type cast or mask the char to get rid of the sign extension. <untested code fragment> char *string = "※"; /* 0x81 0xa6 */ unsigned short int sjis_char; sjis_char = (string[0] << 8) | (string[1] & 0xff); printf( "%02x, %02x, %04x, %04x, %04x¥n", ¥ string[0], string[1], ((string[0] << 8) | string[1]), (string[0] << 8) | (string[1] & 0xff), sjis_char ¥ ); /* 2-byte int outputs: ff81, ffa6, ffa6, 81a6, 81a6 */ /* 4-byte int outputs: ffffff81, ffffffa6, ffffffa6, ffff81a6, 81a6 */ </untested code fragment> Programs that have dealt only with 7-bit ASCII sometimes get caught by this sign extension; I have seen this in DOS programs in the past. 2. 0x5C problem in file names Some operating systems use the backslash as a delimiter in path names. The backslash is encoded as 0x5C. But 0x5C is also used in the second byte of Shift_JIS encoding. Software that does a simple strtok looking for 0x5C characters when parsing file names will incorrectly hit the 0x5C second byte in zenkaku katakana So 0x83 0x5C, or kanji Hyou (table) 0x95 0x5C. This happens to be when I try to use English language software to process Japanese filenames in FAT file systems. Best regards, jimb.
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] LaTeXing
- Next by Date: [tlug] Using japanese keyboard.
- Previous by thread: Re: [tlug] Why is Shift_JIS bad?
- Next by thread: [tlug] Why is Shift_JIS bad?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links