Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Mule-begotten problems for Emacs and Gnus




At present I'm somewhat overwhelmed by the amount of information and
ideas that my two public requests (one here on tlug and one on ding)
and one private request (to Erik Naggum) have generated. I'm going
over this material now. 

But one small point on the Unicode-Mule issue: I vaguely remember that
as much as maybe a year before the early development versions of the
merged Mule-Emacs started to appear in the .notready directory of
etlport.etl.go.jp:/pub/mule someone, perhaps Handa-san himself(???),
said that the reason that mule was not going to be merged with Emacs
proper was that RMS insisted that Mule first be rewritten using
Unicode. Does anyone else remember this?

Another small point: As Stephen as indicated, it turns out that the
64K-worth of code points in Unicode is not enough.  They wrestle with
this problem on the Unicode list occasionally. The biggest consumer is
the Unihan repertoire, which uses up some 20,000 code points and still
omits most kanji (strictly in terms of numbers, not current
usage). There are about 50,000 kanji in Morohashi's big dictionary and
40 some thousand in the old Kangxi dictionary (1712 AD), for
example. There would have been a solution for this, and a good one,
but it would have required a lot of clear-headed work (w/o political
interference), i.e. to encode the *hemigrams* of kanji in the Unicode
standard rather than giving a code point to each whole kanji and
thereby condemning Unicode to NEVER being able to encode all kanji. I
say "never" because even if every single known kanji were encoded in a
future, extended 32-bit "Unicode", (estimates run at more than 75,000
kanji!)  there would still remain the inability to represent new
kanji. Even though new kanji would be but a tiny tiny fraction of the
total, they could not be directly represented with Unicode, whereas
new words, nonsense words, whatever, in languages that are written in
the Latin script, for example, do not have this limitation.

Representing the hemigrams, instead of whole kanji, would require a
relatively modest number of code points (< 3000), and by using this
approach all 70,000 kanji could be represented as well as any new
kanji that were invented. (See the contest for Japanese school kids to
invent new kanji that shows up in Japanese newspapers about once a
year as an odd-ball example of new kanji.) When I say it requires hard
work, I know from experience since I have been working on a system for
representing all kanji in terms of their hemigrams for the past couple
years. The methodology was first demonstrated to me at UC Berkeley by
Professor Peter A. Boodberg nearly thirty years ago.

Jon Babcock
jon@example.com
---------------------------------------------------------------
Next TLUG Nomikai: 14 January 1998 19:15  Tokyo station
Yaesu Chuo ticket gate.  Or go directly to Tengu TokyoEkiMae 19:30
Chuo-ku, Kyobashi 1-1-6, EchiZenYa Bld. B1/B2 03-3275-3691
Next Saturday Meeting: 14 February 1998 12:30 Tokyo Station
Yaesu Chuo ticket gate.
---------------------------------------------------------------
a word from the sponsor:
TWICS - Japan's First Public-Access Internet System
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links