Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: Umlauts & Kanji
- To: tlug@example.com
- Subject: Re: tlug: Umlauts & Kanji
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Thu, 18 Sep 1997 16:26:01 +0900
- In-reply-to: Your message of "Tue, 16 Sep 1997 09:39:26 EST." <199709152339.JAA11232@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug
-------------------------------------------------------- tlug note from "Stephen J. Turnbull" <turnbull@example.com> -------------------------------------------------------- >>>>> "Jim" == Jim Breen <jwb@example.com> writes: Jim> On Sep 15, 11:48am, =?iso-8859-1?Q?Thomas_B=E4tzler?= wrote: } Jim> Subject: tlug: Umlauts & Kanji >>> I hacked that by creating a MIME type named >>> text/html;charset=iso-2022-jp for the extension jhtml. I >>> doubt it's the way it was supposed to be done, but at least it >>> seemed to work. Jim> AFAIK, that's the kosher way to do it. I do it with Jim> "chars=x-euc-jp" inside a <meta ... > You mean (?): <meta http-equiv="content-type" content="text/html; CHARSET=x-euc-jp"> If you are mixing files with different charsets in the same directory, what Thomas is doing is probably best, because you can do "jconv -ij file.JIS.html -os file.SJIS.html". It's easy to forget to fix those META elements if you translate the file to another charset. I've embarrassed myself pretty badly that way (generates unfixable mojibake on conformant browsers). >>> problem #2: how can I mix German and Japanese, or rather >>> Umlauts and Kana/Kanji in the same frame? The way I understand >>> what I read in Lunde, it's not possible to mix those with >>> SJIS, since the Umlauts are not included in the character >>> table. How'bout Unicode? Anybody interested in more results >>> should I ever get them? Jim> At present I think you need to do it by coding your Japanese Jim> as iso-2022-jp. For the two-byte codes such as EUC and SJIS, ~~~~~~~~ 8-bit? ------------------------------+ Jim> the code ranges collide with those in iso-8859-1, so the Jim> normal way of doing "Latin" diacritic marks is not available. I believe the Mule/W3 sample page includes EUC Japanese, not ISO-2022-JP, but I could be wrong. Aren't mode shifts possible in 8-bit EUC? However, it don't matter much. AFAIK only Mule/W3 handles _multilingual_ text as opposed to localized text (eg, AFAICT Netscape doesn't even really localize---input is a bitch as we all know---it just changes fonts). Even the last couple of Arena-I18N versions (well that was 6 months ago, actually) did it this way.... The problem is not primarily on the server side, except in generating translations of content (preparing class notes in two languages does make it suck to be me sometimes), it's on the browser side. It would be very cool if MS started distributing XEmacs + W3 as its standard browser, but then they'd have to distribute the source and you'd be able to rip out the code that searches out and destroys your LILO MBR.... Jim> Unicode will, of course, fix all this. Uh-huh. This is Heaven's Preordained Course (YOW! unified OUTput AND inPUT METH'uds), but.... There are still people on comp.emacs talking about how the new Mule (FSFmacs 20.x) is going to break every line of ELisp code ever written (and it looks like it will; "All praise Ben!") and how "the Japanese" should get their heads out of their "7-bit mindset" and go with the "8-bit world". And they of course blame "the Japanese" who oppose "the whole Unicode idea" for political reasons.... For goodness's sake, you can't even get Greek with your Scandinavian in 8 bits. "Moshi moshi?!"---Iijima Ai, ramen (?) commercial Point is, that to most non-Orientals, "internationalization" mostly means Latin-1 support (how many Greek programmers do you know, and all the Israelis read English kinda attitude); and for most Orientals, especially Japanese, huge tracts of legacy data in various national and corporate charsets are a continuing fact of life. Unicode will happen quickly only if MS sniffs a profit in it. Anyway, we'll see. But as somebody pointed out, despite the alleged "support" for Unicode in Java (MS's Java interpreter evidently swabs internally, which is OK, but saves files in wrong-endian format, which is evil) and Windows NT (why are there national versions? why aren't the help files in Unicode?), where are the fonts and applications? If you don't have 'em, Jim, who would? -- Stephen J. Turnbull Institute of Policy and Planning Sciences Yaseppochi-Gumi University of Tsukuba http://turnbull.sk.tsukuba.ac.jp/ Tel: +81 (298) 53-5091; Fax: 55-3849 turnbull@example.com Next TLUG meeting is Saturday October 11, 1997 ----------------------------------------------------------------- a word from the sponsor will appear below TWICS - Japan's First Public-Access Internet System. www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- References:
- Re: tlug: Umlauts & Kanji
- From: jwb@example.com (Jim Breen)
Home | Main Index | Thread Index
- Prev by Date: tlug: Aterm troubles solved
- Next by Date: tlug: strokes.el
- Prev by thread: Re: tlug: Umlauts & Kanji
- Next by thread: Re: tlug: Umlauts & Kanji
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links