Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Doing kanji and stuff in email headers



Brian Chandler writes:

 > o What's the neatest way to decide if it's necessary? OR

The most elegant way is to use a language whose standard library is at
least mildly intelligent about these things.  Are you sure PHP doesn't
have a function to RFC-2047-encode (and decode) mail headers?  The
algorithm is standard and pretty simple.

 > o Can I after all just use UTF-8 because it's ok nowadays?

Well, that depends on your RFC-2047-encode function.  What *should*
happen is that pure ASCII isn't transfer-encoded at all, languages
(eg, Spanish or German) that are mostly Latin characters should be
quoted-printable-encoded[1], and non-Latin languages (both unibyte
such as Russian and multibyte like Japanese) should be base64 encoded.
If your encoding function isn't that smart, you should try to detect
each case.  (ASCII is easy, and there are reasonable heuristics for
distinguishing text appropriate for quoted-printable from that which
should use base64.)


Footnotes: 
[1]  Bonus points here for using NFD Unicode so that base Latin
characters are readable.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links