Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: A couple of questions about Unicode
- To: tlug@example.com
- Subject: Re: tlug: A couple of questions about Unicode
- From: Jon Babcock <jon@example.com>
- Date: 10 Jan 1998 03:30:23 -0700
- Cc: michael@example.com
- In-Reply-To: Taro Yamamoto's message of Sat, 10 Jan 1998 16:03:40 +0900
- References: <Pine.LNX.3.96LJ1.1b7.980110093817.18865A-100000@example.com> <34B71D4C.1684ACAD@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
>>>>> "TY" == Taro Yamamoto <tyamamot@example.com> writes: TY> Craig Oda wrote: >> that there is a Japanese book out about how bad unicode is for >> the Japanese. Evidently, it was a best seller in Japan. First, does anyone have the title or any bibliographic info on this book? --- --- --- I think Yamamoto-san has correctly identified two of the most important issues regarding kanji encoding in Unicode: 1) TY> It means that each character defined in such standards is not TY> a "representation (instance)" but a "prototype" of the TY> character as functional and semantic information unit. <snip> TY> All such talks about "representation" model of characters, TY> glyphs and fonts is beyond the scope of Unicode and character TY> set standards (this is a very important point when one comments TY> on a character set standard such as Unicode). To understand Unicode, it is essential to make this distinction. Each kanji included in the Unified Han Repertoire is, or should be, a prototype, an "abstract class", if you will, and not a concrete instantiation of that class. The task of instantiation or representation is left to the font makers. An instance of the class, a representation of the prototype, is usually referred to as a "glyph". This is becoming better understood, but in the beginning of the development of Unicode failure to clearly understand value of this distinction, and to apply it, was the cause of much confusion it seems to me. But, as Yamamoto says, TY> Unicode, it is based on source character sets (such as JIS X TY> 0208 and 0212) Unicode was based on *existing* character sets. To the examples just mentioned, GB and Big5 can be added, and there were others. In short, Unicode attempted the impossible. On the one hand there was the laudable goal of compiling a list of the minimum number of kanji prototypes, no two of which would be the same, that could be mapped to all or nearly all of the kanji glyphs (representations of the prototype that can be seen with the eyes and not merely conceived of in thought) actually in use in kanji-using scripts (CJK). On the other hand, there was the perceived necessity (politics played a role here) of accommodating existing character sets already in use by computers. Unicode made a gallant attempt to reconcile these opposing forces, but the result is a compromise, albeit a rather practical one, IMO. So, in no small number of cases, we have *more than one Unicode character for the same prototype*. What in reality is merely a glyph variant, an alternate form of instantiation, is incorrectly elevated to the status of a prototype, an abstract class. Moreover, on the other hand, due to inherent limitations of the existing source character sets, Unicode provides *no prototype at all for most of the glyphs represented in the traditional repertories*, such as the Kangxi Dictionary, or in more up-to-date versions of those, such as the large Morohashi kanwa dictionary. It could be argued that Unicode should have concentrated on developing a real unified Han character set of its own and forgot about accommodating the existing national sets. But the counter-argument here might then be that if such were the case, Unicode would never have been more than a academic exercise unable to gain a toehold in the "real world". Unicode appears to be trying to achieve a balance between including in its list just the minimum elements (graphemes) of a script that are needed to write it, and thereby handing on the task of composing those graphemes, of rendering those graphemes, to the OS or the application, and providing a certain amount of that composition service ready-made within itself. (The big addition, in Unicode version 2, of the Hangul composite characters stands as a good example of the later.) Although, this sort of muddy mixture, of compromise, does not appeal to the purists (me included), it may be that it is the only approach that had any chance of acceptance, under current conditions. I don't know. Jon Babcock jon@example.com --------------------------------------------------------------- Next TLUG Nomikai: 14 January 1998 19:15 Tokyo station Yaesu Chuo ticket gate. Or go directly to Tengu TokyoEkiMae 19:30 Chuo-ku, Kyobashi 1-1-6, EchiZenYa Bld. B1/B2 03-3275-3691 Next Saturday Meeting: 14 February 1998 12:30 Tokyo Station Yaesu Chuo ticket gate. --------------------------------------------------------------- a word from the sponsor: TWICS - Japan's First Public-Access Internet System www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- Follow-Ups:
- Re: tlug: A couple of questions about Unicode
- From: Taro Yamamoto <tyamamot@example.com>
- References:
- Re: tlug: A couple of questions about Unicode
- From: Craig Oda <craig@example.com>
- Re: tlug: A couple of questions about Unicode
- From: Taro Yamamoto <tyamamot@example.com>
Home | Main Index | Thread Index
- Prev by Date: Redhat 5.0 (was tlug: various stuff)
- Next by Date: Re: tlug: various stuff
- Prev by thread: Re: tlug: A couple of questions about Unicode
- Next by thread: Re: tlug: A couple of questions about Unicode
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links