Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Japanese and Web pages



>>>>> "Stephen" == Stephen J Turnbull <turnbull@example.com> writes:


>>>>> "Andy" == Andrew S Howell <andy@example.com> writes:
    Andy> Hello,

    Andy> I'm trying to figure out how Japanese works with web
    Andy> pages. I have two pages I've created using mule to html-ify
    Andy> existing documents. Netscape fails to auto-detect one of
    Andy> them, but is fine on the other. The one that works, jconv
    Andy> reports is New-JIS, while the one that does not work, is
    Andy> reported as Shift-JIS. Simple, I thought, just use jconv to
    Andy> convert the Shift-JIS one to New-JIS. Did that, but now the
    Andy> Netscape can't make sense of it no matter what I set the
    Andy> Document-Encoding to.

    Stephen> What Netscape are you using?  Linux?  What version?  I
    Stephen> have not been real happy with its Japanese support in
    Stephen> general, but that could be my own problem.

Netscape version 3.0b7 on Linux and SunOS 4.1.3

    Stephen> Can you read the (converted) new one with Mule OK?  If
    Stephen> not, maybe it's one of the rare documents that jconv and
    Stephen> nkf screw up on autodetecting (it's possible for their
    Stephen> algorithms to get confused between EUC and S-JIS).  If
    Stephen> so, try setting the input encoding explicitly (I think
    Stephen> the switch is "jconv -is -oj").

Looks like this is the case. I read the converted doc back in mule,
and oohh, what a mess! I'm not sure what the documents started out
as. I stared with an existing english html template, and read them in
with mule. Maybe it got munged in the process. I'll have to break out
"od" and see if I can't figure out what they were to start with.

    Stephen> If it's only partly mojibake, then maybe Netscape got
    Stephen> confused about embedded angle brackets.  I don't think
    Stephen> Netscape is very smart about them; I know that paragraphs
    Stephen> <P> in Japanese text occasionally get munged and I've
    Stephen> seen Japanese text simply disappear when "<" is part of a
    Stephen> JISX-0208 character.  I believe that this is due to the
    Stephen> DTD for HTML which probably did not consider Japanese
    Stephen> carefully.

I give up, what does "mojibake" mean? disguised characters? 

    Andy> Any ideas? Are there any FAQs that would help?

    Stephen> Ken Lunde's CJK.txt which can be found on any O'Reilly
    Stephen> mirror is most likely to be helpful, but I don't know if
    Stephen> he has Netscape/HTML on his mind.

http://www.ora.com/people/authors/lunde/

Which lead me to a Japanese client test page:

http://www.etl.go.jp/People/yamana/clients_test.html

Which looks like it may answer some questions. I havn't realy checked
it out yet, as I quite busy with a cold loaned to me by my wife. :(

Thanks for the lead...

Andy
-----------------------------------------------------------------
a word from the sponsor will appear below
-----------------------------------------------------------------
The TLUG mailing list is proudly sponsored by TWICS - Japan's First
Public-Access Internet System.  Now offering 20,000 yen/year flat
rate Internet access with no time charges.  Full line of corporate
Internet and intranet products are available.   info@example.com
Tel: 03-3351-5977   Fax: 03-3353-6096


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links