Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Java and Japanese e-mail



--------------------------------------------------------
tlug note from "Stephen J. Turnbull" <turnbull@example.com>
--------------------------------------------------------
>>>>> "Craig" == Craig Oda <craig@example.com> writes:

    Craig> A couple of things, there seems to be a linux-i8n now.  I'm
    Craig> downloading it now and will give it a go.  Also, the dates
    Craig> indicate that it was made available only a few days ago.
    Craig> Fresh.

I definitely am waiting for your report on this.  Especially if it
makes HotJava stop swallowing memory.

    Craig> Back to my little e-mail problem.  I'm using servlets, not

What's a servlet?  I assume it's a server crippled in basically the
same way as an application is crippled to become an applet?  This
shouldn't matter.

    Craig> applets.  The servlet talks to an Apache HTTP server.  It
    Craig> receives input from the Apache server and outputs to the
    Craig> server.  The end-user is inputing the information in a form
    Craig> in something like netscape.  Let's say that they are on Win
    Craig> 95J.  They input ShiftJIS.  The server catches it and gives
    Craig> it to the Java servlet.  The servlet gets the parameters

What is Apache passing to the servlet?  SJIS or ISO-2022-JP or ASCII?
I thought your users were like requesting files with ASCII names or
something, but the file content might be Japanese.  If the server is
passing munged nihongo in requests to the servlet, heaven only knows
what it might be doing....

    Craig> and tosses it to the SMTP server on port 25.  Sendmail then
    Craig> takes it and spits it something like the POP server which
    Craig> spits it to something like Eudora or Netscape mail.

No.  MTAs like sendmail normally receive input on stdin or the like
from an MUA (Eudora), then pass it over the network to an MTA
(sendmail) which passes it to a delivery agent (aka MDA, eg, procmail
or deliver) or into a spool file.  Normally POPservers and imapds use
those spool files.  The interface going through the Internet is very
clean: pass an address and data to sendmail on stdin, get protocol
replies on stdout.  (See RFC 1123 and 821.)  The complexity is all on
the localhost side (composition on the sending side, delivery and
display for the receiver).  If your mailers are properly set up, using
sendmail should be as transparent as a named pipe.  It's not mail,
it's your program's output stream that's the problem.  Have you tried
writing a local file instead of passing the output to sendmail?
Betcha get the same mojibake.

File --+
User --+--> MUA --+--> MTA --(SMTP)--> MTA --> spool file --+
etc. --+          |                                         |
                  |                        +-- POPserver <--+
stdin (usually) --+                 MUA <--+-- procmail <---+

    Craig> Correct me if I'm wrong, but I've thought that normally the
    Craig> POP server spit out JIS and that the mail client converted
    Craig> it to ShiftJIS for the Mac Windows world or to EUC for the
    Craig> Linux world.

No.  The POP/IMAP server is also an MTA (or MDA).  MTAs and MDAs are
expected to modify the headers (adding "Received" and "X-UIDL" and so
on), but are not allowed to touch the message body.  (Some do anyway;
in particular, they often do translations from ASCII to EBCDIC.  Thus
we need "Content-Encoding: Base64".)

    Craig> This is what I thought:
 
    Craig> Windows '95 ==> Eudora =====> SMTP ====> Internet =====>
    Craig> ShiftJIS    Convert to JIS     JIS

    Craig> Something =======> Eudora ====> displays on Windows
    Craig> like POP JIS Converts to ShiftJIS     Monitor

    Craig> So, what I thought was that I had to convert into JIS
    Craig> before it got sucked in by the SMTP server.  What does
    Craig> everyone else think?

No, they're not required to do any such thing.  Once you've received
an "accepted for delivery" notification from the listener, you are
supposed to assume that the same series of bytes you sent will be
in the message body received by the remote user.  So you need to
coordinate with the remote user on how to interpret your bytes, NOT
with sendmail.  If the remote user can handle JIS, S-JIS, and EUC, it
doesn't matter to sendmail which you use.  In fact, sendmail doesn't
care if you send Russian to a user expecting SJIS.  It will faithfully 
pass it on....

What the MUA (Eudora and your servlet) should do is add Content-Type
and Content-Encoding MIME headers.  The mail or web server isn't
supposed to care.  Only the UA is involved in dealing with content
(formatting, MIME encoding, and the like).  Every other element of the
system should be thought of as allowed to add headers but not touch
the rest of the message at all.  (This is not strictly true,
especially for web servers with SHTML and so on, but it's close
enough.)

The reason for the specification of JIS (actually, ISO-2022 or 7-bit
EUC) is that in the distant past many serial links used an 11-bit
encoding with 1 start and two stop bits (or maybe vice versa) and 8
bits of data.  With ASCII being 7 bits, and links being unreliable, 1
bit was often used for parity _at the hardware level_ so that passing
8 bit data was impossible.  So some software was "optimized" on the
assumption that anything it was asked to pass over the net would use
7-bit bytes.  Thus encodings were designed to fit into the 7-bit
framework for mail transport.

However, as I understand it, there is nothing mandating the use of
7-bit codes in Internet mail as long as the proper MIME headers are
attached.  If you send Shift-JIS to someone with a MIME-compliant
mailer that does not understand Shift-JIS, it should say "this doc is
in Shift-JIS, I'm lost---tell me what to do".  However, all mailers
that claim to understand Japanese should implement ISO-2022-JP, since
this is a base implementation that 99.44% of systems on the Internet
and connected networks (Bitnet, UUCP) can pass without munging.

The main mistake (from an RFC compliance point of view) in your
program is failure to prepend a set of MIME headers.  (I believe that
no MIME headers implies US-ASCII on the Internet.)  However, that
should not affect sendmail or popmail at all; they should deliver
exactly the set of headers and body they were passed, plus a few extra
headers that describe (Received) or aid (X-UIDL) their activity.

But without seeing more code for the file inclusion or examples of
munged nihongo, it's hard to guess what might be going wrong.

Steve

-- 
                            Stephen J. Turnbull
Institute of Policy and Planning Sciences                    Yaseppochi-Gumi
University of Tsukuba                      http://turnbull.sk.tsukuba.ac.jp/
Tel: +81 (298) 53-5091;  Fax: 55-3849              turnbull@example.com
Next TLUG meeting is Saturday October 11, 1997
-----------------------------------------------------------------
a word from the sponsor will appear below
TWICS - Japan's First Public-Access Internet System.
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links