Re: tlug: Umlauts & Kanji

To: tlug@example.com
Subject: Re: tlug: Umlauts & Kanji
From: "Stephen J. Turnbull" <turnbull@example.com>
Date: Thu, 18 Sep 1997 16:26:01 +0900
In-reply-to: Your message of "Tue, 16 Sep 1997 09:39:26 EST." <199709152339.JAA11232@example.com>
Reply-To: tlug@example.com
Sender: owner-tlug

--------------------------------------------------------
tlug note from "Stephen J. Turnbull" <turnbull@example.com>
--------------------------------------------------------
>>>>> "Jim" == Jim Breen <jwb@example.com> writes:

    Jim> On Sep 15, 11:48am, =?iso-8859-1?Q?Thomas_B=E4tzler?= wrote: }
    Jim> Subject: tlug: Umlauts & Kanji
    >>> I hacked that by creating a MIME type named
    >>> text/html;charset=iso-2022-jp for the extension jhtml.  I
    >>> doubt it's the way it was supposed to be done, but at least it
    >>> seemed to work.

    Jim> AFAIK, that's the kosher way to do it. I do it with
    Jim> "chars=x-euc-jp" inside a <meta ... >

You mean (?):

<meta http-equiv="content-type" content="text/html; CHARSET=x-euc-jp">

If you are mixing files with different charsets in the same directory,
what Thomas is doing is probably best, because you can do
"jconv -ij file.JIS.html -os file.SJIS.html".  It's easy to forget to
fix those META elements if you translate the file to another charset.
I've embarrassed myself pretty badly that way (generates unfixable
mojibake on conformant browsers).

    >>> problem #2: how can I mix German and Japanese, or rather
    >>> Umlauts and Kana/Kanji in the same frame? The way I understand
    >>> what I read in Lunde, it's not possible to mix those with
    >>> SJIS, since the Umlauts are not included in the character
    >>> table. How'bout Unicode? Anybody interested in more results
    >>> should I ever get them?

    Jim> At present I think you need to do it by coding your Japanese
    Jim> as iso-2022-jp. For the two-byte codes such as EUC and SJIS,
                                 ~~~~~~~~
8-bit? ------------------------------+

    Jim> the code ranges collide with those in iso-8859-1, so the
    Jim> normal way of doing "Latin" diacritic marks is not available.

I believe the Mule/W3 sample page includes EUC Japanese, not
ISO-2022-JP, but I could be wrong.  Aren't mode shifts possible in
8-bit EUC?

However, it don't matter much.  AFAIK only Mule/W3 handles
_multilingual_ text as opposed to localized text (eg, AFAICT Netscape
doesn't even really localize---input is a bitch as we all know---it
just changes fonts).  Even the last couple of Arena-I18N versions
(well that was 6 months ago, actually) did it this way....

The problem is not primarily on the server side, except in generating
translations of content (preparing class notes in two languages does
make it suck to be me sometimes), it's on the browser side.  It would
be very cool if MS started distributing XEmacs + W3 as its standard
browser, but then they'd have to distribute the source and you'd be
able to rip out the code that searches out and destroys your LILO
MBR....

    Jim> Unicode will, of course, fix all this.

Uh-huh.  This is Heaven's Preordained Course (YOW! unified OUTput AND
inPUT METH'uds), but....

There are still people on comp.emacs talking about how the new Mule
(FSFmacs 20.x) is going to break every line of ELisp code ever written
(and it looks like it will; "All praise Ben!") and how "the Japanese"
should get their heads out of their "7-bit mindset" and go with the
"8-bit world".  And they of course blame "the Japanese" who oppose
"the whole Unicode idea" for political reasons....  For goodness's
sake, you can't even get Greek with your Scandinavian in 8 bits.

"Moshi moshi?!"---Iijima Ai, ramen (?) commercial

Point is, that to most non-Orientals, "internationalization" mostly
means Latin-1 support (how many Greek programmers do you know, and all
the Israelis read English kinda attitude); and for most Orientals,
especially Japanese, huge tracts of legacy data in various national
and corporate charsets are a continuing fact of life.  Unicode will
happen quickly only if MS sniffs a profit in it.

Anyway, we'll see.  But as somebody pointed out, despite the alleged
"support" for Unicode in Java (MS's Java interpreter evidently swabs
internally, which is OK, but saves files in wrong-endian format, which
is evil) and Windows NT (why are there national versions?  why aren't
the help files in Unicode?), where are the fonts and applications?  If
you don't have 'em, Jim, who would?

-- 
                            Stephen J. Turnbull
Institute of Policy and Planning Sciences                    Yaseppochi-Gumi
University of Tsukuba                      http://turnbull.sk.tsukuba.ac.jp/
Tel: +81 (298) 53-5091;  Fax: 55-3849              turnbull@example.com
Next TLUG meeting is Saturday October 11, 1997
-----------------------------------------------------------------
a word from the sponsor will appear below
TWICS - Japan's First Public-Access Internet System.
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096

References:
- Re: tlug: Umlauts & Kanji
  - From: jwb@example.com (Jim Breen)

Prev by Date: tlug: Aterm troubles solved
Next by Date: tlug: strokes.el
Prev by thread: Re: tlug: Umlauts & Kanji
Next by thread: Re: tlug: Umlauts & Kanji
Index(es):
- Date
- Thread

Home | Main Index | Thread Index