Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: kterm/kinput2 conversion



>>>>> "Scott" == Scott Stone <sstone@example.com> writes:

    Scott> On Wed, 9 Sep 1998, Eric S. Standlee wrote:
    >> I was just wondering, why does it seem that most other OS's use
    >> SJIS and NewSJIS, while most Linux users use EUC.  I run into
    >> the problem that when I email to non-Linux users they get
    >> mojibake.  Should I switch to SJIS for everything?  Is there an
    >> easy way to handle mass conversion of already existing files?

    Scott> SJIS ~= Microsoft.  Unix =~ EUC.  Corollary - SJIS is evil,
    Scott> EUC is good.  Plus, EUC is 8-bit clean and is easier to
    Scott> work with, IMHO.  Most mail transport uses 7-bit JIS, to my
    Scott> knowledge.

8-bit clean?  What's that?

AFAIK SJIS is ISO-2022 clean, although I think that at least some of
the proposed extensions to JIS X 0212 destroy that.  EUC is definitely 
easier to work with, as shift-JIS is a quasi-modal transformation of
JIS (but Eric doesn't care about that, he's not going to _write_ a
mail agent ;-).

I don't believe there are SJIS escape sequences in ISO-2022, so you
can't use both Japanese and German, let alone Japanese and Chinese, in 
a SJIS text.

The relevant RFCs for mail (and HTTP, for what it's worth) specify
that ISO-2022-JP be used in headers, and it is preferred for the
bodies.  It is stricter about format than 7-bit JIS, but uses the same
character encoding and escape sequences.  I dunno about Pine,
although the Japanized version should do the right thing, but most
Emacs/Mule-based mail agents will use ISO-2022-JP correctly.

    Scott> Anyway, i believe that the 'nkf' program will convert
    Scott> files/documents between the different encoding formats. If
    Scott> you don't have it, I think you can use the TurboLinux-J RPM
    Scott> in a Redhat system.

Use Lunde's jconv in preference to nkf.  nkf doesn't pass the "wileyc"
test; even if you read the source you can't trust it.  I don't know
from recent nkf's but in 1.5 and 1.6 the switches were neither
orthogonal nor exhaustive.  jconv's recognition algorithm was much
better than nkf 1.5's.  And jconv has a "noop" test option which
simply reports the file's encoding.

ftp://ftp.uu.net/vendor/oreilly/nutshell/ujip/<somewhere>

I haven't tried to read the source for kkc yet, but its very existence
looks like NIH syndrome to me.

-- 
University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences        Tel/fax: +1 (298) 53-5091
--------------------------------------------------------------
Next Nomikai: 18 September, 19:30 Tengu TokyoEkiMae 03-3275-3691
Next Meeting: 10 October, Tokyo Station Yaesu central gate 12:30
--------------------------------------------------------------
Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links