Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Han Unification (Mixed CJK)



>>>>> "Jim" == Jim  <jep200404@example.com> writes:

    Jim> David Riggs wrote:

    >> Its a constant [hassle] for folks like me who work in more than
    >> one "kanji" system. Unicode alas does not solve this one, ...

    Jim> Yup. Alas, Unicode _creates_ this problem.

No.  Unicode swaps these problems (while providing standard and
comparatively sane mechanisms for mitigating them) for those of
systems like ISO-2022.

Historically and in practice, Han unification should be called "Han
identification".  Of course it will be controversial, since some
people will disagree for personal reasons with the principles of
identification, just as otherwise intelligent people disagree with my
preferred spelling for personal reasons. ;-) However, the basic
principle is no different from asserting that A is A in French,
Spanish and English.  For most characters, such as ichi, ni, san, the
principle is absolutely uncontroversial.  The practical import is, for
example, that you can use (mostly) a Japanese input method to input
Chinese and vice versa.

Somebody like David will eventually want his corpus, not in plain
text, but with embedded semantic markup---links to commentary, variant
texts, furigana, etc.  As long as you're doing this, it's trivial to
add fonts.

In plain text you have the alternative of "Plane 14 language tags".
These are deprecated, but have the significant advantage over ISO-2022
"extension syntax" that the same character in different languages is
still the same character, eg for search and input purposes.

And there _are_ national extensions (Planes 2 and 3) for Chinese and
maybe Japanese being developed, which will then disambiguate at least
some characters again.

So in sum the problem (as usual) is not Unicode, it's that Unicode is
not fully implemented, whereas hundreds of fragile  character set- and
language-specific workarounds are.  Thus by now for most multilingual
use it's not _too_ painful to use ISO 2022, but extensions are
mendokusai at best, and often fairly well-meaning organizations like
XFree86 will write their own standards and then go off and implement
extensions that violate them!

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links