Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Unicode



Here is my perspective as a Chinese and a Japanese user:

- Use of Unicode allows me to enter a character with Japanese IME and
search Google and have Chinese web pages come up that matches the search
term.  (A plus or minus depends on whether you desire it or not; a plus
to me)

- Use of Unicode simplifies programming for operations common to
different languages; or complicates every language to the same level,
depending on your perspective.  For Kanji users that means programs that
were Kanji unfriendly becomes less so because the common routines are
used for both whereas before two-byte languages usually are handled
differently.

- Ironically, the advantage of a common 16-bit coding becomes less so
(I'd say much less so) once more-than-16-bit space is used.  To handle
Unicode properly nowadays you need at least 21-bit (the range coded by
UTF-16); and most will use a 32-bit internal type.  But once you have
that much space you don't run out easily.

  That is, the increase in computing power, which made a 32-bit data
type practical, and the code space expansion beyond 16-bit has made
Unification less meaningful than before; but in general I still
think it is a good idea.  (see my first point.)

- As of now, Unicode is not suitable for coding a multi-language font;
you run into problems like difference in appearence by language so you
need multiple glyphs for the same code point...  However, this is a
problem Unicode is NOT designed to solve, so it can't be blamed.

  I'd say what might be good is a GLYPH-TYPE encoding that encodes the
difference in a standard way -- e.g. whether the kusakanmuri is 4
strokes or 3 strokes; use 1 byte so it combined with Unicode (for the
character) fits in 32 bits.

- Question:

Is the range coded by UTF-16 the intended range for Unicode for the
foreseeable future?  (There seems to be even effort to remove the
private use space beyond that range).

- Question:

If the above is true, will a fixed-length 24-bit encoding be useful? 
I've been pondering about it.  This way you can fit 33 characters in 100
bytes, rather than 25 in UTF-16 or even less (16?) in UTF-8, in the
worst case.

- Question:

How is HKSCS supported?  MS seems to use private use space for coding
those characters, are they not in Unicode?  (Call me too lazy to do my
homework).

Stephen


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links