Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Java and Japanese e-mail (trying again, key bounceon ^C yarrrgh)



--------------------------------------------------------
tlug note from "Stephen J. Turnbull" <turnbull@example.com>
--------------------------------------------------------

>>>>> "Craig" == Craig Oda <craig@example.com> writes:

    Craig> I'm using a Java servlet on Linux and Java object on Linux
    Craig> to send the contents of a form to some people.  The
    Craig> contents of the form are in Japanese.  It could either be
    Craig> EUC or ShiftJIS.  I've never done this before, and assume
    Craig> that there is a simple answer.

Why?  Java is officially Unicode, but I haven't seen any real support
in the JDK or docs (but I haven't looked in about 3 months ;-)

My feeling back then was that basically there was a lot of work to be
done on Java i18n, in particular making translator objects.

    Craig> My object works fine with ASCII, but I think I need to
    Craig> convert to JIS before talking to sendmail.

RFC-wise, you should (or do something else about 8-bit munging
relays), but in practice this should not faze the mail transport
system (sendmail).

    Craig> In order to send the mail I am connecting directly to the
    Craig> SMTP port on 25 and sending the greeting and data.  My
    Craig> assumption is that I need to convert the Japanese into JIS
    Craig> before sending.  Is this correct?  I'm doing it with
    Craig> ShiftJIS (not my machine :-) and getting bakemoji.

Are you sure the MTA (sendmail) is getting confused, and not the
receiving MUA?  Ie, do you have something as smart as Mule to look at
the alleged mojibake and see that it really is munged?

    Craig> I'm curious about a couple things, 1) is there an easier
    Craig> way to do this? 2) has anyone sent Japanese with Java
    Craig> before?

(1) There will be, of course :-) but I had't found one before the summer.
(2) Yes, but I remember it was tricky.  It wasn't "Japanese", it was a
    byte stream which happened to be EUC code.  In particular, unless
    you have a filter on the input stream which takes whatever codes
    your terminal generates and turns them into Unicode, and vice
    versa for the OutputStream, you need to do everything in bytes
    and you need to handle the widechar/multi-byte issues in your own
    code.

Never got that far personally, the example in (2) above was done by
somebody else in a Java SIG here (I gave up on the SIG, there were
several members who didn't know how to use loops and arrays).  But...

I don't think this should matter for squirting a file in binary over a
TCP connection.  Not even for endianess issues.  However, you may have
to be careful about the fact that internally characters are not bytes
in Java.  I note that you very carefully deleted all the code that
dealt with actually handling the file!  The fact that the code works
with ASCII should clue you that your mail-related code is cool, it's
the file I/O that's wacking out.

(Caveat: you may need to use a ByteStream rather than a character
stream in talking to the remote SMTP listener, although you probably
ought to be able to attach the file-squirter to the TCP connection
only for when it's needed, and use the current code for talking SMTP
protocol to the remote listener.)

    Craig> My code fragment to follow.  It works with ASCII.

    Craig>     // the mail program requires a single dot on a line
    Craig>           // by itself

Technically the requirement is in the SMTP protocol....

hth

-- 
                            Stephen J. Turnbull
Institute of Policy and Planning Sciences                    Yaseppochi-Gumi
University of Tsukuba                      http://turnbull.sk.tsukuba.ac.jp/
Tel: +81 (298) 53-5091;  Fax: 55-3849              turnbull@example.com
Next TLUG meeting is Saturday October 11, 1997
-----------------------------------------------------------------
a word from the sponsor will appear below
TWICS - Japan's First Public-Access Internet System.
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links