Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Re: Unicode (Was: apache2...)
- Date: Sat, 12 Jul 2003 19:30:17 +1000 (EST)
- From: Jim Breen <jwb@example.com>
- Subject: Re: [tlug] Re: Unicode (Was: apache2...)
Shimpei Yamashita <shimpei@example.com> wrote: >> A few questions, from a complete amateur.... >> >> On Sat, Jul 12, 2003 at 12:45:28AM +1000, >> Jim Breen wrote: >> > Things don't "look" like anything in Unicode. The look comes from the >> > font. You choose the font. You buy a Chinese-style Unicode font where >> > the hanzi look Chinese, or you buy a Japanese-style font. The codes >> > stay the same. >> >> Does that mean that a multilingual text document, rendered with a single >> Unicode font, may only "look" correct in one Asian language at a time? Depending on the font, yes. >> If so, >> does it not mean that Unicode only *pretends* to be context-independent, and >> actually depends on the user (which could be the application or the human >> being) to provide that context because it fails to provide a context- >> presentation mechanism internally? Not at all. There are language codes in Unicode, and if the document has been prepared with them, a smart application can do things like selecting fonts according to them, or invoking spell-checkers according to the language, or all the other language-dependent things. It's the same with A,a,B,b, etc. Different European cultures actually have their preferred fonts and think others look foreign, but no-one has accused ISO-8859-* of pretense or cultural hegemony on this score. >> > Be that as it may, EVERY kanji in JIS X 0208 and JIS X 0212 ended up in >> > Unicode 1.0. What is called the "source separation rule" meant that if >> > a kanji/hanzi/hanja pair that would otherwise be unified occurs >> > multiply in one of the national standards, then it appears multiply in >> > Unicode. Thus all six version of the "ken" kanji, which blind Freddie >> > could tell are really the same, are dutifully replicated in Unicode, >> > because that's the way they are in JIS X 0208. >> >> That doesn't seem to solve the above problem at all, which involves >> *different* countries using different glyphs for the "same" character. No, I mentioned that because people still say Unicode is "missing some kanji", and "was prepared ignoring national wishes", which is where this thread started. >> Jim, what I don't quite understand is this: exactly what problem is Unicode >> meant to solve anyway? The key problem was the inability of the pre-Unicode codes to mix languages in a usable way. Have you ever tried to mix Japanese with French or German? It was only possible before Unicode by using ISO-2022 escaping which is a truly horrible way to handle text. In the case of the "CJK" languages it was worse. At least with ordinary alphabetics an "a" or a "b" tended to be the same regardless of language, but with the CJK languages, something like 手紙 was coded differently for every language. If you were mixing, say, Japanese and Korean in a document and doing a string search you could find yourself in a tizz. Of course font-rendering in mixed-code system is a nightmare. Remember that one of the groupings driving Unicode was the collection of computer companies: Sun, IBM, Apple, Microsoft, etc. To them it was a real problem that needed fixing. >> Given that, what rationale went into the decision to >> combine certain glyphs between countries that cause caused so much grief among >> your opponents? It's easy to dismiss Unicode opponents as nationalist >> counter-revolutionaries, but it isn't clear to me (yet) that the Unicode camp >> has addressed their grievance adequately. You can only go so far addressing irrationality. With people saying that a 十 (kanji) can't be unified with a 十 (hanzi) because one is essentially Japanese and the other irrevocably Chinese, no addressing is possible, short of abandoning the whole process. Imagine if the French and the Italians demanded their own alphabets as a matter of national pride and identity. One argument mounted by the anti-Unicode people was that it was "unfair" to unify hanzi/kanji/hanja when the "Latin" and "Greek" portions of Unicode retained distinct identical characters (e.g. A.) In fact the "Source Separation Rule" I mentioned before, which was brought in at the insistence of the CJK countries, requires this to be the case. JIS X 0208 has two identical letter A codings: A (2341) and Α (2621), not to mention А (2721), which is almost the same. Cheers Jim -- Jim Breen (j.breen(a)csse.monash.edu.au http://www.csse.monash.edu.au/~jwb/) Computer Science & Software Engineering, Tel: +61 3 9905 3298 Monash University, VIC 3800, Australia Fax: +61 3 9905 5146 (Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学
- Follow-Ups:
- Re: [tlug] Re: Unicode (Was: apache2...)
- From: simon colston
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Re: Unicode
- Next by Date: [tlug] Using Linux for the desktop
- Previous by thread: Re: [tlug] Re: Unicode
- Next by thread: Re: [tlug] Re: Unicode (Was: apache2...)
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links