Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: Java and Japanese e-mail
- To: tlug@example.com
- Subject: Re: tlug: Java and Japanese e-mail
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Sat, 23 Aug 1997 16:10:06 +0900
- In-reply-to: Your message of "Sat, 23 Aug 1997 01:08:57 +0900." <Pine.HPP.3.95.970823004932.17003B-100000@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug
-------------------------------------------------------- tlug note from "Stephen J. Turnbull" <turnbull@example.com> -------------------------------------------------------- >>>>> "Craig" == Craig Oda <craig@example.com> writes: Craig> A couple of things, there seems to be a linux-i8n now. I'm Craig> downloading it now and will give it a go. Also, the dates Craig> indicate that it was made available only a few days ago. Craig> Fresh. I definitely am waiting for your report on this. Especially if it makes HotJava stop swallowing memory. Craig> Back to my little e-mail problem. I'm using servlets, not What's a servlet? I assume it's a server crippled in basically the same way as an application is crippled to become an applet? This shouldn't matter. Craig> applets. The servlet talks to an Apache HTTP server. It Craig> receives input from the Apache server and outputs to the Craig> server. The end-user is inputing the information in a form Craig> in something like netscape. Let's say that they are on Win Craig> 95J. They input ShiftJIS. The server catches it and gives Craig> it to the Java servlet. The servlet gets the parameters What is Apache passing to the servlet? SJIS or ISO-2022-JP or ASCII? I thought your users were like requesting files with ASCII names or something, but the file content might be Japanese. If the server is passing munged nihongo in requests to the servlet, heaven only knows what it might be doing.... Craig> and tosses it to the SMTP server on port 25. Sendmail then Craig> takes it and spits it something like the POP server which Craig> spits it to something like Eudora or Netscape mail. No. MTAs like sendmail normally receive input on stdin or the like from an MUA (Eudora), then pass it over the network to an MTA (sendmail) which passes it to a delivery agent (aka MDA, eg, procmail or deliver) or into a spool file. Normally POPservers and imapds use those spool files. The interface going through the Internet is very clean: pass an address and data to sendmail on stdin, get protocol replies on stdout. (See RFC 1123 and 821.) The complexity is all on the localhost side (composition on the sending side, delivery and display for the receiver). If your mailers are properly set up, using sendmail should be as transparent as a named pipe. It's not mail, it's your program's output stream that's the problem. Have you tried writing a local file instead of passing the output to sendmail? Betcha get the same mojibake. File --+ User --+--> MUA --+--> MTA --(SMTP)--> MTA --> spool file --+ etc. --+ | | | +-- POPserver <--+ stdin (usually) --+ MUA <--+-- procmail <---+ Craig> Correct me if I'm wrong, but I've thought that normally the Craig> POP server spit out JIS and that the mail client converted Craig> it to ShiftJIS for the Mac Windows world or to EUC for the Craig> Linux world. No. The POP/IMAP server is also an MTA (or MDA). MTAs and MDAs are expected to modify the headers (adding "Received" and "X-UIDL" and so on), but are not allowed to touch the message body. (Some do anyway; in particular, they often do translations from ASCII to EBCDIC. Thus we need "Content-Encoding: Base64".) Craig> This is what I thought: Craig> Windows '95 ==> Eudora =====> SMTP ====> Internet =====> Craig> ShiftJIS Convert to JIS JIS Craig> Something =======> Eudora ====> displays on Windows Craig> like POP JIS Converts to ShiftJIS Monitor Craig> So, what I thought was that I had to convert into JIS Craig> before it got sucked in by the SMTP server. What does Craig> everyone else think? No, they're not required to do any such thing. Once you've received an "accepted for delivery" notification from the listener, you are supposed to assume that the same series of bytes you sent will be in the message body received by the remote user. So you need to coordinate with the remote user on how to interpret your bytes, NOT with sendmail. If the remote user can handle JIS, S-JIS, and EUC, it doesn't matter to sendmail which you use. In fact, sendmail doesn't care if you send Russian to a user expecting SJIS. It will faithfully pass it on.... What the MUA (Eudora and your servlet) should do is add Content-Type and Content-Encoding MIME headers. The mail or web server isn't supposed to care. Only the UA is involved in dealing with content (formatting, MIME encoding, and the like). Every other element of the system should be thought of as allowed to add headers but not touch the rest of the message at all. (This is not strictly true, especially for web servers with SHTML and so on, but it's close enough.) The reason for the specification of JIS (actually, ISO-2022 or 7-bit EUC) is that in the distant past many serial links used an 11-bit encoding with 1 start and two stop bits (or maybe vice versa) and 8 bits of data. With ASCII being 7 bits, and links being unreliable, 1 bit was often used for parity _at the hardware level_ so that passing 8 bit data was impossible. So some software was "optimized" on the assumption that anything it was asked to pass over the net would use 7-bit bytes. Thus encodings were designed to fit into the 7-bit framework for mail transport. However, as I understand it, there is nothing mandating the use of 7-bit codes in Internet mail as long as the proper MIME headers are attached. If you send Shift-JIS to someone with a MIME-compliant mailer that does not understand Shift-JIS, it should say "this doc is in Shift-JIS, I'm lost---tell me what to do". However, all mailers that claim to understand Japanese should implement ISO-2022-JP, since this is a base implementation that 99.44% of systems on the Internet and connected networks (Bitnet, UUCP) can pass without munging. The main mistake (from an RFC compliance point of view) in your program is failure to prepend a set of MIME headers. (I believe that no MIME headers implies US-ASCII on the Internet.) However, that should not affect sendmail or popmail at all; they should deliver exactly the set of headers and body they were passed, plus a few extra headers that describe (Received) or aid (X-UIDL) their activity. But without seeing more code for the file inclusion or examples of munged nihongo, it's hard to guess what might be going wrong. Steve -- Stephen J. Turnbull Institute of Policy and Planning Sciences Yaseppochi-Gumi University of Tsukuba http://turnbull.sk.tsukuba.ac.jp/ Tel: +81 (298) 53-5091; Fax: 55-3849 turnbull@example.com Next TLUG meeting is Saturday October 11, 1997 ----------------------------------------------------------------- a word from the sponsor will appear below TWICS - Japan's First Public-Access Internet System. www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- Follow-Ups:
- Re: tlug: Java and Japanese e-mail
- From: Craig Oda <craig@example.com>
- References:
- Re: tlug: Java and Japanese e-mail
- From: Craig Oda <craig@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: tlug: Python woes
- Next by Date: Re: tlug: Java and Japanese e-mail
- Prev by thread: Re: tlug: Java and Japanese e-mail
- Next by thread: Re: tlug: Java and Japanese e-mail
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links