Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: unicode



--------------------------------------------------------
tlug note from "Stephen J. Turnbull" <turnbull@example.com>
--------------------------------------------------------
>>>>> "Craig" == Craig Oda <craig@example.com> writes:

    Craig> On Mon, 26 May 1997, Stephen J. Turnbull wrote:

    >> This will get fixed when Netscape goes to Unicode (or to UCS-4,
    >> like Mule is doing at the moment).  If it's a real problem,
    >> w3.el is the only answer as far as I know.  If it's cosmetic,
    >> you can live with it....

    Craig> Stephen, isn't Netscape 4.0 a "Unicode" browser?  I can set

Don't know, don't have it.

    Craig> my default encoding on my home system to UTF 8 bit and
    Craig> display a unicode file that contains Japanese, upright

Well, that sounds fine to me....

    Craig> This makes me wonder what being "unicode" compliant really

It's complicated, like most of this stuff.  For one thing, there's
ISO-10646 and there's Unicode.  Unicode is a subset of full ISO-10646
which fits into 16bits and covers all of the world's character sets
more or less, at the cost of getting glyphs wrong.  Ie, you probably
are aware that Chinese Chinese characters look different from Japanese 
Chinese characters, which are different again from Korean Chinese
characters, even when they're the same.  Unicode coerces them all into 
the same character set, at the cost of forcing a single font.  So
Japanese Unicode fonts will look like Japanese even when displaying
Chinese.  Readable (if you can read them), but not pretty (according
to natives).

To fix this, the full Universal Character Set uses 4 bytes, and allows 
the Japanese to use JIS code and the Taiwanese GB and so on.

16-bit Unicode requires translation tables, because (a) the various
languages don't even agree on the "basic 1000" and (b) they don't
order the ones they do agree on the same (eg, JIS orders level 1 kanji
by yomi but Chinese orders all hanzi by radical and stroke count).

ISO8859-1 doesn't require translation tables, because it's mapped into 
Unicode with the high byte = 00.  But other ISO-8859-* sets will
require some translation to avoid redundancy.

    Craig> me.  I hear that Java is unicode compliant, but again I'm

Unicode-compliant mostly means not trashing 16-bit codes.  This is
mostly a problem for C-like string processing, I believe.

Later,

-- 
                            Stephen J. Turnbull
Institute of Policy and Planning Sciences                    Yaseppochi-Gumi
University of Tsukuba                      http://turnbull.sk.tsukuba.ac.jp/
Tel: +81 (298) 53-5091;  Fax: 55-3849              turnbull@example.com
-----------------------------------------------------------------
a word from the sponsor will appear below
-----------------------------------------------------------------
The TLUG mailing list is proudly sponsored by TWICS - Japan's First
Public-Access Internet System.  Now offering 20,000 yen/year flat
rate Internet access with no time charges.  Full line of corporate
Internet and intranet products are available.   info@example.com
Tel: 03-3351-5977   Fax: 03-3353-6096


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links