Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Japanese in Perl on Linux
- Date: Wed, 12 May 2004 14:20:13 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Japanese in Perl on Linux
- References: <1084328057.18154.17.camel@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Portable Code, linux)
>>>>> "Blomberg" == Blomberg David <dblomber@example.com> writes: Blomberg> I am using perl to send an email it works except that Blomberg> Japanese text come out as question marks. This is non-trivial stuff. You should be familiar with RFC 2821, 2822, 1463, 2045, 2046, 2047, 2048, and 2049 at least, or you _will_ get it wrong. (Not that it much matters in this country; neither Microsoft nor the author of Emacs/Mule figured out MIME for half a decade after it was JIS and RFC mandated.) You also need to bone up on Perl's character encoding transformation libraries. Blomberg> print MAIL "$BIT@example.com%Q%9%o!<%I>pJs(B"; Blomberg> (first version output is just question marks) You presumably need to tell Perl that "I don't think we're in Kansas anymore, Toto." Question marks are the usual (criminally negligent, there should be a fatal error) way of telling you that you've neglected to tell the program about your locale. Maybe setting LANG is enough, maybe not. I don't think modern Perls will do the right thing with raw Japanese; you probably have to convert to Unicode internally. (But I haven't done M17N in Perl since early 1995. ;-) Blomberg> $temp = MIME::Base64::encode("$BIT@example.com%Q%9%o!<%I>pJs(B"); Blomberg> chomp($temp); Blomberg> print MAIL "=?ISO-2022-JP?B?$temp?=\n"; Blomberg> (this one I get the string Blomberg> =?ISO-2022-JP?B?Pz8/Pz8/Pz8/?= Blomberg> mime encoded Japanese text as best I can tell) Not a chance. The signature of ISO-2022-JP in base64 encoding is quite characteristic, because (according the RFC 1463) it is REQUIRED to start with ESC ( B. Thus is it _always_ "GyRC". Second, it is REQUIRED to end with ESC $ B. There are exactly three trailing Base64 patterns (because 2, the number of bytes in a JIS character, and 3, the number of bytes that are encoded by a group of four Base64 digits, are relatively prime). Also, 12 base64 digits is 9 bytes, so it can't possibly all be JIS. I conclude you definitely need to convert the Japanese to some internal encoding (some form of Unicode, you probably don't need to know), and then from that to ISO-2022-JP. Doesn't Perl have an iconv(3) wrapper? -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
- References:
- [tlug] Japanese in Perl on Linux
- From: Blomberg David
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Japanese in Perl on Linux
- Next by Date: Re: [tlug] Japanese in Perl on Linux
- Previous by thread: Re: [tlug] Japanese in Perl on Linux
- Next by thread: Re: [tlug] Japanese in Perl on Linux
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links