Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Unicode
- Date: Sat, 12 Jul 2003 22:16:38 +0900
- From: Stephen Lee <sl@example.com>
- Subject: Re: [tlug] Unicode
Here is my perspective as a Chinese and a Japanese user: - Use of Unicode allows me to enter a character with Japanese IME and search Google and have Chinese web pages come up that matches the search term. (A plus or minus depends on whether you desire it or not; a plus to me) - Use of Unicode simplifies programming for operations common to different languages; or complicates every language to the same level, depending on your perspective. For Kanji users that means programs that were Kanji unfriendly becomes less so because the common routines are used for both whereas before two-byte languages usually are handled differently. - Ironically, the advantage of a common 16-bit coding becomes less so (I'd say much less so) once more-than-16-bit space is used. To handle Unicode properly nowadays you need at least 21-bit (the range coded by UTF-16); and most will use a 32-bit internal type. But once you have that much space you don't run out easily. That is, the increase in computing power, which made a 32-bit data type practical, and the code space expansion beyond 16-bit has made Unification less meaningful than before; but in general I still think it is a good idea. (see my first point.) - As of now, Unicode is not suitable for coding a multi-language font; you run into problems like difference in appearence by language so you need multiple glyphs for the same code point... However, this is a problem Unicode is NOT designed to solve, so it can't be blamed. I'd say what might be good is a GLYPH-TYPE encoding that encodes the difference in a standard way -- e.g. whether the kusakanmuri is 4 strokes or 3 strokes; use 1 byte so it combined with Unicode (for the character) fits in 32 bits. - Question: Is the range coded by UTF-16 the intended range for Unicode for the foreseeable future? (There seems to be even effort to remove the private use space beyond that range). - Question: If the above is true, will a fixed-length 24-bit encoding be useful? I've been pondering about it. This way you can fit 33 characters in 100 bytes, rather than 25 in UTF-16 or even less (16?) in UTF-8, in the worst case. - Question: How is HKSCS supported? MS seems to use private use space for coding those characters, are they not in Unicode? (Call me too lazy to do my homework). Stephen
- Follow-Ups:
- Re: [tlug] Unicode
- From: Charles Muller
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Re: Unicode
- Next by Date: Re: [tlug] Unicode
- Previous by thread: Re: [tlug] Unicode
- Next by thread: Re: [tlug] Unicode
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links