Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: XEmacs and Kanji detection



--------------------------------------------------------
tlug note from "Stephen J. Turnbull" <turnbull@example.com>
--------------------------------------------------------
>>>>> "Steve" == Steve Dunham <dunham@example.com> writes:

    Steve> I believe the locale is passed to the input method. (I
    Steve> suspect this is why the Solaris input method didn't work
    Steve> for me.)

Dunno about Solaris, but under Linux, XIM will definitely crash your
XEmacs+Mule if XMODIFIERS sets an input method and that input method
fails to open for any reason.  However, locale is used in a bunch of
places in XEmacs.  For starters, `fgrep -l locale
/var/project/xemacs-20.2/src/*.[ch]' lists 25 files.

    >> The locale processing also seems to be inconsistently
    >> implemented, since "LANG=ja_JP.EUC" results in SJIS being
    >> correctly displayed (and "LANG=ja_JP.SJIS" produces correct
    >> results for EUC).

    Steve> Umm, what do you want it to do? Disable EUC support?

Well, there are times when I'd like to be able to do something like
that, since there are ambiguous files.  I have yet to figure out how
to get a buffer to be reread in various different encodings
conveniently.  One can set `buffer-file-coding-system' and relatives,
but this is pretty clumsy.

    Steve> Currently it only uses the LANG variable to determine the
    Steve> "default" language for startup.

Thanks for confirming that.

    Steve> XEmacs, by default, seems to have iso-2202 filters in the
    Steve> loading process.  The SJIS detection seems to be added when
    Steve> you load the japanese language stuff.  This happens when
    Steve> LANG=ja or you pick the Japanese menu item.

To be picky, iso-2022 seems to be enabled by Mule; it doesn't work if
you compile XEmacs without Mule.  (I had some spare cycles while I was
lecturing the other day....)  I would guess that SJIS can _only_ be
recognized if you assume Japanese; I'm not sure why EUC doesn't get
recognized (my guess is that most EUC files do not conform to the
ISO-2022 standard of starting out in ASCII (ISO-Latin-1?) and shifting
into Japanese; if they did, they'd probably be recognized).

    Steve> (You can probably do this in .emacs too, you should be able
    Steve> to select Japanese and "Save Options", but I haven't tried
    Steve> it.)

This works, Jason Molenda mentioned it.

    Steve> Anyways, you can change the "default" language in the menu:
    Steve> "Options/Language Environment".

But this is too late for XIM (as I mentioned earlier).

    Steve> File encoding can be set on a per buffer basis using C-x
    Steve> C-n f (type C-x C-n C-h for a list of bindings).

Unfortunately, this doesn't work.  :-)  You need to use
`set-buffer-file-coding-for-read' which isn't bound.  A trivial
complaint since one can bind it oneself.

    >> Furthermore, using `(setenv "LANG" "C")' or any other locale
    >> does not affect this, so the locale of the XEmacs process seems
    >> to be fixed at invocation, and only used to invoke Mule
    >> features.

    Steve> Ahh, digging around, I found yet another reason for this:
    Steve> in the lisp directory, there is a
    Steve> locale/ja/locale-start.el.  Apparently, there is no
    Steve> directory for other locales.  (XEmacs desperately needs
    Steve> testers for non-japanese MULE stuff.)

Do you know where this is used?  All this file does is set up a
localized version of the opening "splash frame", and a localized
version of the usage message.

    Steve> Why do you get a feeling that the mule features are hacked
    Steve> in?  It feels fairly clean to me.  There are probably some
    Steve> necessary differences from the gnu emacs MULE, because of

Yes, and all the ones I've seen so far I like ;-)

    Steve> design differences in the core editor.  But as far as I can
    Steve> tell, MULE is nicely integrated into the editor.

Aaaah, maybe I misspoke (I'm not sure about that yet).  Maybe it's the 
XIM that's not properly integrated into Mule.  But Mule (from the
little I've read the code) seems not to be designed to integrate
external henkan servers into its multilingual features.  As far as I
can tell, you either use a server or you don't, it's more or less
fixed at startup (for XIM, at compilation in the cases of native Canna
and Wnn support), and it is not selectable by menu.  This is maybe a
Mule problem more than an XEmacs problem.

    >> It seems to choke on the (standard-compliant) MIME content-type
    >> header.  If that (`Content-type: text/html;
    >> charset=iso-2022-jp') is present, the non-ASCII characters turn
    >> into mojibake :-(.

    Steve> Sounds like a bug in w3.el... It seems to be ignoring the
    Steve> charset specification in the HTTP header.  Does it work in
    Steve> MULE?

Dunno for sure; can't get the 3.0.x version of w3.el to work with GNU
Emacs/Mule (I think this is due to my config stepping on it; but I'm
not sure where).  But w3-2.2.26 shows the same bug with GNU
Emacs/Mule.

And it's worse than ignoring; it only gets it wrong when the charset
spec is present.

Anyway, I've reported the bug.

    >> Do you get the right charsets when switching from an SJIS
    >> server to an ISO-2022 server in w3.el?  Do you need to do
    >> anything else (in particular I'm thinking of the liblocale
    >> dodge that works with Netscrap)?  Do you get messages about not
    >> being able to set locale, using C/POSIX instead?

    Steve> This is the funny thing: I get those messages from XEmacs
    Steve> for "LANG=de", perl does the same thing.  But I don't get
    Steve> any messages from Netscape (which displays correctly

Netscape probably does not use libc localization, it probably uses
Motif localization.  Unless you mean it displays localized messages to 
std{err,out}.

    Steve> localized text) or various GNU utilities (again with
    Steve> varying degrees of localized text).

But that's not the explanation for GNU.  GNU probably is just more
robust for some reason.

Ah, yes.  Looking at /usr/share/locale/*, you'll see that locale `de'
doesn't have full support, evidently because the German countries
don't share currency and date conventions and so on.  So all that is
there is the LC_MESSAGES subdirectory.  Evidently perl and XEmacs
either have a use for money :-) and GNU utilities don't, or they are
less careful about checking all that kind of stuff, while the GNU
utilities only request the locale functions they expect to need.  I'll
have to try this out.

Yup, perl only complains about character sorting and typing, and
doesn't complain if you specify `LANG=de_DE'.

    Steve> I fully expect that message for LANG=ja, since I don't have
    Steve> any "ja" locale in /usr/share/locale.  (This needs to be
    Steve> fixed.)

Do you know if anyone is working on it, or where to find out?  I
wonder if the i18n features properly support LC_COLLATE and LC_CTYPE
for wc/mb character sets ... I suppose they must.

    Steve> This is why the liblocale.so is needed. Apparently,
    Steve> it calls some internal X functions to trick it into
    Steve> thinking it has a wide character locale on systems lacking
    Steve> the "ja" locale (English Solaris ships without it).

    Steve> I believe the X locale functions will work in conjuction
    Steve> with libc locale

In one sense, yes.  The models are different.  X puts everything into
one big text file, and is very concerned about character sets and
encodings, while ignoring money (I can't :) and messages (ditto).
Linux's implementation of C/POSIX splits things into what are
apparently compiled objects for each category, and doesn't seem to
care about character sets (due to Unicode?).

So evidently the models are pretty well orthogonal to each other,
which is the impression I get from the OReilly "R5 Update" volume,
which says that X i18n is built on ANSI C i18n.

Apparently the liblocale.so hack works only because most programs
never actually call any locale functions except for setlocale()
(which is as it should be), and neither does Linux....

    >> Lists of all defined locales and aliases are in
    >> /usr/X11R6/lib/X11/locale/locale.{dir,alias}.

    Steve> The X locale information is there.  The libc locales are in
    Steve> /usr/share/locale (on Debian).


    Steve> $B%9%A%'%t!&%@!<%J%`(Bdunham@example.com

    Steve> -----------------------------------------------------------------
    Steve> a word from the sponsor will appear below -----------------------------------------------------------------
    Steve> The TLUG mailing list is proudly sponsored by TWICS -
    Steve> Japan's First Public-Access Internet System.  Now offering
    Steve> 20,000 yen/year flat rate Internet access with no time
    Steve> charges.  Full line of corporate Internet and intranet
    Steve> products are available.  info@example.com Tel: 03-3351-5977
    Steve> Fax: 03-3353-6096
-----------------------------------------------------------------
a word from the sponsor will appear below
-----------------------------------------------------------------
The TLUG mailing list is proudly sponsored by TWICS - Japan's First
Public-Access Internet System.  Now offering 20,000 yen/year flat
rate Internet access with no time charges.  Full line of corporate
Internet and intranet products are available.   info@example.com
Tel: 03-3351-5977   Fax: 03-3353-6096


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links