Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: unicode
- To: tlug@example.com
- Subject: Re: tlug: unicode
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Tue, 27 May 1997 17:52:23 +0900
- In-reply-to: Your message of "Tue, 27 May 1997 17:25:26 EST." <199705270725.RAA24634@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug
-------------------------------------------------------- tlug note from "Stephen J. Turnbull" <turnbull@example.com> -------------------------------------------------------- >>>>> "Jim" == Jim Breen <jwb@example.com> writes: Jim> On May 27, 3:22pm, "Stephen J. Turnbull" wrote: } Subject: Jim> Re: tlug: unicode >>> For example, suppose I'm grepping for all the Japanese words >>> in a Chinese-language nihongo textbook. Given a 31-bit code >>> space, a UCS-4 grep can too. Jim> I hope it never has to - It would be a disaster of the first Too late; this is what Mule does already. I think it's unlikely to change, since it's an efficient way to handle multilingual input and editing. Jim> order if Chinese and Japanese ended up as distinct sets. They never will, not in the Basic Multilingual Plane. We can have our cake and eat it, too. At the cost of very fat characters for internal processing. [snip] Jim> I expect that eventually national font styles will be handled Jim> [by wrapping them in tags like for italics in TeX]. In fact this seems to be exactly the direction TeX (well, Omega and CJK) is going. Jim> This is really a presentation markup. It doesn't thrill me Jim> [for the grep example], but I prefer it to the alternative. Agreed. If it's really a matter of style, it's much better to have a markup tag. But I gave a practical, if relatively contrived and trivial, example of when the language tag has real semantic meaning. Also (despite the unification philosophy) the identical character can have different meaning in the different languages. That would mean that a content-indexing program would want to carry language along with characters. You can argue that it's not important, that you can handle it otherwise. I'd like to give the programmers the flexibility to implement it with wider characters in a standard way. >From the user perspective, what will happen, I think, is what you would want: Mule will convert to Unicode before writing a file. The 4-byte representation will rarely be seen outside of RAM owned by Mule and similar tools. I just think it's good to standardize an internal code for things like Mule; we have a good framework for doing it. >>> [At Shift-JIS] I think we've just reached >>> Jim Breen's limit of tolerance. No JIS X 0212. :-) Jim> Wait for it! There's an extension planned for JIS X 0208 Got me! Steve -- Stephen J. Turnbull Institute of Policy and Planning Sciences Yaseppochi-Gumi University of Tsukuba http://turnbull.sk.tsukuba.ac.jp/ Tel: +81 (298) 53-5091; Fax: 55-3849 turnbull@example.com ----------------------------------------------------------------- a word from the sponsor will appear below ----------------------------------------------------------------- The TLUG mailing list is proudly sponsored by TWICS - Japan's First Public-Access Internet System. Now offering 20,000 yen/year flat rate Internet access with no time charges. Full line of corporate Internet and intranet products are available. info@example.com Tel: 03-3351-5977 Fax: 03-3353-6096
- References:
- Re: tlug: unicode
- From: jwb@example.com (Jim Breen)
Home | Main Index | Thread Index
- Prev by Date: Re: tlug: Font problem
- Next by Date: tlug: The Arena CJK distribution
- Prev by thread: Re: tlug: unicode
- Next by thread: Re: tlug: unicode
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links