Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]tlug: Mule-begotten problems for Emacs and Gnus
- To: tlug@example.com
- Subject: tlug: Mule-begotten problems for Emacs and Gnus
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Wed, 7 Jan 1998 12:25:11 +0900 (JST)
- Cc: michael@example.com
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <867m8drhvy.fsf@example.com>
- References: <867m8drhvy.fsf@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
Your subject refers to Gnus, but the post doesn't mention it? AFAIK in XEmacs Mule causes no problems for Gnus (configuration complexities apart); I would suspect that if it's a problem in FSF Emacs it's due to the lack of character type as discussed below. >>>>> "Jon" == Jon Babcock <jon@example.com> writes: Jon> I find that a number of those who belong to the Faith of Jon> Emacs are crying out ... in pain. Anybody know why? Jon> It doesn't use Unicode and the documentation is woefully Jon> inadequate * ... I know about these. The Unicode issue, as such, is a red herring. Unicode is a Western imperialist plot in the minds of many Orientals. Europeans are not going to change from one byte ISO 8859 encodings to two byte Unicode encodings for most purposes; why should the Orientals be restricted to Unicode? And in fact there is a real loss to Orientals in using Unicode (the "Han unification" problem), unlike the Eastern Europeans who will have to use table-driven sorts, but otherwise lose nothing to Unicode. (NB. I'm told that Ukrainians have complaints about the Unicode implementation of Cyrillic, but this is a political issue and could be fixed in future editions; by contrast, there isn't enough room in a two-byte code to accomodate the Orientals.) Allowing Unicode as an external representation is trivial (assuming the CCL interpreter doesn't SIGSEGV as it is prone to do), so I leave it as an exercise for the reader :-). The real issue is not Unicode qua Unicode; while the Europeans in general would love to use Unicode to handle Oriental character sets, many Orientals are adamantly opposed. Books have been written about how Unicode will be the demise of the Japanese language, for example. The real issue is that the internal encoding of Mule is a string of variable length codes rather than an array of "wide characters." More about that below. And the woeful documentation is a "feature". Mule itself is supposed to be transparent to the user, and the user interface aspects (Quail) self-documenting. Now, I have an option on the Rainbow Bridge; anybody interested in buying shares? :-P What's _really_ wrong with this merger? As for the Arcana, I am an initiate of a heretical sect (XEmacs), so I can't say much about FuzzyFace Emacs 20.x from experience. However, some of the perspective from the XEmacs community may be of use. DO NOT quote me or use this material directly without verifying it; this is for background only. Further information about a lot of this can be gotten from the XEmacs documentation. Much of it comes from discussion on the XEmacs beta mailing list, which as far as I know is not archived. I would suggest finding an archive of the FSF Emacs beta list if you can. The Mule developer's list has none of this stuff on it, as far as I can remember. Try dejanews for comp.emacs.*, too. Going a circuitous route, the reason that XEmacs is a heresy is that the XEmacs developers (originally Lucid, Inc; you may have heard of LEmacs---many of these guys have migrated to Netscape; until recently supported by Sun, especially the port of Mule to XEmacs) disagreed with RMS about the implementation of editing primitives in the underlying Emacs LISP language. In particular, RMS being an "old" LISPer believes that functions should be polymorphic because the data is what it is (this may be a misrepresentation; it's the way I make sense of his position). The L/XEmacs developers, on the other hand, felt that more strongly typed primitives were necessary. In particular, Emacs LISP did not, and does not, make a distinction between a character and an integer. XEmacs LISP does. The separation of the character type and the integer type in XEmacs caused a great amount of pain to the developers; so much so that character/integer confusion is referred to as "Ebola": swift and mysterious fatal errors. I believe this was originally done to make it easier to create an 8-bit clean XEmacs. (I wasn't involved then.) The purpose was supporting Latin-1 editing cleanly. It has recently been shown, serendipitously, that any Latin-X character set is supported on localized terminals. (Mule does not support Latin-X, X != 1, very well on a localized terminal; it has assumptions built in, in XEmacs anyway, that make it think that TTYs are either X-Windows, or Japanese, or Latin-1.) This shows the strength of the character abstraction, I think. This made porting Mule to XEmacs easier as well. The point is that Mule does not admit a convenient representation of characters as integers; in the buffer, ASCII characters are one byte, all other one-byte characters (including Latin-1 GR) are two bytes, and most Oriental languages are three bytes. Some may be four or more; I forget the details. All multi-byte characters are (tiny) structs; the first, so-called "leading byte", defines the character set, the remainder gives the encoded value. XEmacs went through the pain of distinguishing between buffer bytes, individual characters, and integers, and now the type conversions are built in. Converting the conversion functions to handle the Mule representation was tedious, but the painful part had already been done. The GNU project is only now going through the painful part, and because of RMS's objections to stronger typing, paradoxically instead of being able to use polymorphous functions, the programmer is required to use different functions for dealing with the same entities in different contexts. This is not so bad at the LISP level in XEmacs; the buffer processing functions were converted to count Bufbytes instead of bytes, at a fairly great efficiency cost in large files. However, in FSF Emacs it is very painful, since there is no Bufbyte abstraction. Programmers must be very careful about how they write C extensions (this is still somewhat true in XEmacs as well; the C code is sprinkled with "this file has been Mule-ized ... but see the comment on func()"), and in fact I have heard that many functions are duplicated depending on whether they treat the buffer as an array of bytes or as a string of Mule characters. The former is much more efficient, but the burden is on the programmer to decide whether the semantics are correct. This has in fact infected the LISP level in FSF Emacs I have heard. Apparently Erik Naggum stated on one of the Emacs newsgroups that almost every existing .el file would need to be converted to use new functions to handle Mule. The LISP code would still work on ASCII (but not Latin-1), but any other charset would have undefined results. The painful consequences implied for Western European Emacs users are obvious. In XEmacs, OTOH, the LISP interface is backward compatible. I think that covers the major cries of pain. Mule tries too hard to be transparent (see the discussion of heuristics below). I think that this is a bad design decision; simple heuristics just don't do a very good job in the complex world post-Tower of Babel, where there not only isn't a single language, but no satisfactory Uni-code either, nor do people talk to each other enough to really want one. It worked quite well for Japanese, but the only thing the Japanese are missing is a Uni-code. ("Good grief, Charlie Brown!") Mule is a dancing bear, and it dances a lot better than I can. People who actually use Mule have to admit, I think, that there just isn't anything like it, not even proprietary. Many proprietary systems provide good bilingual support, or even Oriental language plus Latin-1. But truly multi-lingual? Mule. Gotta have it. Gotta love it. But as long as I'm here.... Mule itself is not well designed, IMHO. For example, there is no way to get a "raw" binary buffer image in LISP (in XEmacs/Mule; I don't know if this has been changed for FSF Emacs 20.x)---binary files are represented as Latin-1 internally. This means that it is a _major_ pain in the butt to implement things like MD5 signatures of non-ASCII files efficiently. Naive implementations give incorrect results. Mule has a very sophisticated set of heuristics for guessing the character set. However, in early 1997 (when I was still using FSF Emacs 19.34b/Mule), Mule was still occasionally screwing up on the EUC/SJIS distinction (I haven't had to deal with SJIS in XEmacs since last spring except for once using W3; it seems like everybody on campus now has a mailer that sends ISO-2022-JP), and I believe this is actually rather common for Chinese and Korean EUC, as they overlap the Japanese EUC space (by definition) and Mule is (understandably) Japan-centric. Although it is simple to specify the default language environment so that Chinese will not be mistaken for Japanese when Chinese is the default, it is not so simple to override the heuristics when they err in exceptional cases because the various controls interact in complex ways. I assume that FSF Emacs 20.x allows input methods like Canna and Wnn, and XIM. However, as far as I know they are not supported in the sense that they are inaccessible from the so-called Library of Emacs Input Methods; only Quail is. This strikes me as obnoxious; it took me about an hour to write the prototype LEIM interfaces to XEmacs/Mule for SKK (the origin of Quail), Canna, and Wnn. It really wouldn't have been all that tough for the developers to include this support. (XIM is much tougher, at least in XEmacs.) HTH --------------------------------------------------------------- Next TLUG Nomikai: 14 January 1998 19:15 Tokyo station Yaesu Chuo ticket gate. Or go directly to Tengu TokyoEkiMae 19:30 Chuo-ku, Kyobashi 1-1-6, EchiZenYa Bld. B1/B2 03-3275-3691 Next Saturday Meeting: 14 February 1998 12:30 Tokyo Station Yaesu Chuo ticket gate. --------------------------------------------------------------- a word from the sponsor: TWICS - Japan's First Public-Access Internet System www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- Follow-Ups:
- Re: tlug: Mule-begotten problems for Emacs and Gnus
- From: Jon Babcock <jon@example.com>
- Re: tlug: Mule-begotten problems for Emacs and Gnus
- From: Karl-Max Wagner <karlmax@example.com>
- References:
- tlug: Mule-begotten problems for Emacs and Gnus
- From: Jon Babcock <jon@example.com>
Home | Main Index | Thread Index
- Prev by Date: tlug: Applixware and printers
- Next by Date: Re: tlug: Applixware and printers and OCR
- Prev by thread: tlug: Mule-begotten problems for Emacs and Gnus
- Next by thread: Re: tlug: Mule-begotten problems for Emacs and Gnus
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links