Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Han Unification (Mixed CJK)
- Date: Tue, 31 Jan 2006 00:37:51 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Han Unification (Mixed CJK)
- References: <43DDEF5E.6030705@example.com><20060130094928.4ad530f7.jep200404@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b24 (dandelion, linux)
>>>>> "Jim" == Jim <jep200404@example.com> writes: Jim> David Riggs wrote: >> Its a constant [hassle] for folks like me who work in more than >> one "kanji" system. Unicode alas does not solve this one, ... Jim> Yup. Alas, Unicode _creates_ this problem. No. Unicode swaps these problems (while providing standard and comparatively sane mechanisms for mitigating them) for those of systems like ISO-2022. Historically and in practice, Han unification should be called "Han identification". Of course it will be controversial, since some people will disagree for personal reasons with the principles of identification, just as otherwise intelligent people disagree with my preferred spelling for personal reasons. ;-) However, the basic principle is no different from asserting that A is A in French, Spanish and English. For most characters, such as ichi, ni, san, the principle is absolutely uncontroversial. The practical import is, for example, that you can use (mostly) a Japanese input method to input Chinese and vice versa. Somebody like David will eventually want his corpus, not in plain text, but with embedded semantic markup---links to commentary, variant texts, furigana, etc. As long as you're doing this, it's trivial to add fonts. In plain text you have the alternative of "Plane 14 language tags". These are deprecated, but have the significant advantage over ISO-2022 "extension syntax" that the same character in different languages is still the same character, eg for search and input purposes. And there _are_ national extensions (Planes 2 and 3) for Chinese and maybe Japanese being developed, which will then disambiguate at least some characters again. So in sum the problem (as usual) is not Unicode, it's that Unicode is not fully implemented, whereas hundreds of fragile character set- and language-specific workarounds are. Thus by now for most multilingual use it's not _too_ painful to use ISO 2022, but extensions are mendokusai at best, and often fairly well-meaning organizations like XFree86 will write their own standards and then go off and implement extensions that violate them! -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
- References:
- Re: [tlug] CJK Mixed in a Letter. Missing bdf font FOUND:hanglm24.bdf
- From: David Riggs
- [tlug] Han Unification (Mixed CJK)
- From: Jim
Home | Main Index | Thread Index
- Prev by Date: [tlug] Han Unification (Mixed CJK)
- Next by Date: Re: [tlug] Linux Format magazine
- Previous by thread: [tlug] Han Unification (Mixed CJK)
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links