Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Re: Unicode (Was: apache2 setup and japanese)



I'm going to reply to several emails here...

>> >Some Japanese don't like Unicode because they hold a very rigid
>> >understanding about the "proper" way to write various Kanji. 

And in fact Unicode says *nothing* about the glyph shape. JIS-X-0221,
the Japanese version of ISO-10646 (i.e. Unicode) painstakingly makes 
this point over and over again, and for many kanji demonstrates 
Japanese AND Chinese AND Korean glyphs to show that different locales
can choose the font they want. They might as well be pissing in the wind
as far as the xenophobes are concerned.

>> Thanks for that info. (Though I have to apologize for a slight thread 
>> hijacking here.) Would that mean that the Japanese "manga" and Korean 
>> "manhwa" would look the same in Unicode, for instance?

Things don't "look" like anything in Unicode. The look comes from the
font. You choose the font. You buy a Chinese-style Unicode font where 
the hanzi look Chinese, or you buy a Japanese-style font. The codes
stay the same.

>> That's a rather funny way of unifying three alphabets, though, just 
>> unifying them in an encoding and waiting for the countries to adopt it. 
>> Mao couldn't have done it better. :D

Seems you don't know how the unification was done. The first steps
towards a unified CJK codeset were initiated over twenty years ago by
<drumroll>the Japanese</drumroll>. Yes the Diet Library kicked off a 
project which led to ANSI/NISO Z39.64 which covers about 16,000
characters. It is still used in some library bibliographic systems.

Soon after, both ISO and a set of computer companies began separate
projects to do the whole thing properly. After a year or so they decided
to merge their activities - hence ISO-10646/Unicode. The Japanese were
involved from day 1, only the trouble was (a) the Japanese standards
committees were in a state of internal chaos which didn't get fixed until
there was a clean sweep a few years later, and (b) there were powerful
groups who considered any unification of kanji with hanzi and hanja
to be a blow against racial purity and the national identity. The result
of this was that the Japanese who were active in the "Han Unification"
leading to Unicode 1.0 were mostly expatriates.

Be that as it may, EVERY kanji in JIS X 0208 and JIS X 0212 ended up in
Unicode 1.0. What is called the "source separation rule" meant that if 
a kanji/hanzi/hanja pair that would otherwise be unified occurs
multiply in one of the national standards, then it appears multiply in
Unicode. Thus all six version of the "ken" kanji, which blind Freddie
could tell are really the same, are dutifully replicated in Unicode,
because that's the way they are in JIS X 0208.

>> From: Charles Muller <acmuller@example.com>
>> 
>> One very unfortunate myth that is purveyed far more in Japan
>> than in the other East Asian countries is the bit that Unicode was "forced on"
>> them by an outside organization. 

Naruhodo. It's all part of the litany of misinformation.

>>  All the Unicode people were trying to do in
>> the early days, was to try to get people together to create a single
>> encoding standard for as many languages as possible. Because of problems
>> related to cultural chauvinism, they for a long time found it basically
>> impossible to get the Japanese, Taiwanese, Chinese, and Koreans to sit down
>> at the table together, and even when they finally did, there were all kinds
>> of ridiculous disputes over minor differences in glyph shapes that never even
>> needed to be dealt with at the level of encoding to begin with.

And when in fact, glyphs were really not the issue. It's as though
people were wanting one code for a straight "7" and another for a 7 with
a "-" across it. It's just two ways of writing the same character.

>> The Japanese only ended up getting on board at the end when they figured out
>> that if they didn't, they'd be left out completely, and even since then,
>> there has been almost nothing but complaint from the Japanese
>> end. Therefore, when it came to actually using Unicode for major text
>> digitization projects, while the Taiwanese and Koreans went ahead and began
>> to use Unicode fairly early, most major Japanese digital data projects
>> refused to use Unicode (as still does, for example, the monstrously funded
>> Japan Memory Project), and thus Japan continues to fall further behind in
>> these areas.

The thing that has had the biggest impact on the spread of Unicode has
been, sad to say, Microsoft. They committed to it early, and the staff
managed to convince Bill G. that it should become the single code-set
for NT and derivatives. They threw squillions at it, but it meant that
when the first Unicoded NT-derivative (Win2000) emerged it was on a
sound base for multilingual apps.

>> So don't let anyone make you think that the East Asian countries were given
>> no voice or opportunity for input in the matter.

Japan came to the party, and my former Gaidai colleague Kouji Shibano was
active in some of the committees, as well as pushing through JIS213,
which had the sub-plot of getting the last few missing kanji into
Unicode 4.

Jim

-- 
Jim Breen (j.breen(a)csse.monash.edu.au  http://www.csse.monash.edu.au/~jwb/)
Computer Science & Software Engineering,                Tel: +61 3 9905 3298
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大学

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links