Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Re: Hotmail mail encoding



From: Stuart Luppescu <slu@example.com>
On æ—¥, 2007-04-22 at 14:40 +1000, Jim Breen wrote:
> Wonder how thay got Hotmail to do that? Whenever I try Hotmail in Japanese
> it just uses those horrible entity codes.

Well, when I forwarded the message from my wife's computer (with
Outlook) the original headers were not preserved (and I'm too lazy to go
upstairs and copy the info from her computer), but I sent myself a
message from hotmail, and, mirabile dictu, gmail displays it correctly,
but evolution just shows this:

&#12371;&#12428;&#12399;&#35430;&#39443;&#12391;&#12377;&#12290;
&#35501;&#12417;&#12414;&#12377;&#12363;&#12290;

Here are the headers from the message I sent to myself:

Mime-Version: 1.0
Content-Type: text/plain; format=flowed

No info on encoding or charset.

Could the fact that I set my default encoding in Firefox to UTF-8 be
related to this?

No, those &#35501;&#12417; ... are HTML/SGML codes for Unicode codepoints as decimal numbers. They are the lazy programmers hack around avoiding proper multilingual handling. Hotmail is a prime culprit, but there are others.

They presume that everyone is reading mail via a client that can/will
treat everything as HTML. And stuff anyone who uses text-based
clients. Since IE and Outlook will display things OK, why would
they change?

From: Patrick Kellaher <kalmite@example.com>

Kind of sounds to me like Evolution wasn't compiled with multi language
support (--enable-nls), however it is hard to believe that this is the
case now a days.  What distro are you using?  On the windows side, I can
tell you I have had problems with shift_jis messages, however usually
installing all the Asian language support files works, although I am
going to guess that this has already been done.

Nothing to do with multi-language support; those entity codes are a sort-of non-language support. Yes the Evolution coders could decide to assume that a string of digits between "&#" and ";" represent a Unicode code-point, but it's throwing standards to the winds. I've been asked to support them in WWWJDIC's input interface, and I have refused on principle.

Cheers

Jim
--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links