Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Japanese in Perl on Linux



>>>>> "Blomberg" == Blomberg David <dblomber@example.com> writes:

    Blomberg> I am using perl to send an email it works except that
    Blomberg> Japanese text come out as question marks.

This is non-trivial stuff.  You should be familiar with RFC 2821,
2822, 1463, 2045, 2046, 2047, 2048, and 2049 at least, or you _will_
get it wrong.  (Not that it much matters in this country; neither
Microsoft nor the author of Emacs/Mule figured out MIME for half a
decade after it was JIS and RFC mandated.)  You also need to bone up
on Perl's character encoding transformation libraries.

    Blomberg> print MAIL "$BIT@example.com%Q%9%o!<%I>pJs(B";

    Blomberg> (first version output is just question marks)

You presumably need to tell Perl that "I don't think we're in Kansas
anymore, Toto."  Question marks are the usual (criminally negligent,
there should be a fatal error) way of telling you that you've
neglected to tell the program about your locale.  Maybe setting LANG
is enough, maybe not.

I don't think modern Perls will do the right thing with raw Japanese;
you probably have to convert to Unicode internally.  (But I haven't
done M17N in Perl since early 1995.  ;-)

    Blomberg> $temp = MIME::Base64::encode("$BIT@example.com%Q%9%o!<%I>pJs(B");
    Blomberg> chomp($temp);
    Blomberg> print MAIL "=?ISO-2022-JP?B?$temp?=\n";

    Blomberg> (this one I get the string
    Blomberg> =?ISO-2022-JP?B?Pz8/Pz8/Pz8/?=
    Blomberg> mime encoded Japanese text as best I can tell)

Not a chance.  The signature of ISO-2022-JP in base64 encoding is
quite characteristic, because (according the RFC 1463) it is REQUIRED
to start with ESC ( B.  Thus is it _always_ "GyRC".  Second, it is
REQUIRED to end with ESC $ B.  There are exactly three trailing Base64
patterns (because 2, the number of bytes in a JIS character, and 3,
the number of bytes that are encoded by a group of four Base64 digits,
are relatively prime).

Also, 12 base64 digits is 9 bytes, so it can't possibly all be JIS.

I conclude you definitely need to convert the Japanese to some
internal encoding (some form of Unicode, you probably don't need to
know), and then from that to ISO-2022-JP.  Doesn't Perl have an
iconv(3) wrapper?



-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links