[tlug] Re: Hotmail mail encoding

Jim Breen writes:

 > "Stephen J. Turnbull" <> dashed off:

 > > the latter, it's compatible with UTF-8 encoded with quoted-printable,
 > > decoding to gibberish (as far as I can tell).
 > 'T'ain't Japanese in UTF-8. UTF8ed Japanese is always in 3-byte sequences
 > starting with En (apart from some of the rare kanji that turned up in
 > JIS213).

True, but it is none-the-less compatible with UTF-8, and decodes to
the right number of characters.

 > There are 28 bytes in that sequence, and the last is a plain old
 > "B" which can't be part of UTF8.

Since when isn't ASCII a subset of UTF-8?  ;-)  (Not for Japanese,
though, you're right about that.)

 > Smells like Shit-JIS.

Assuming that decodes to JIS X 0208 gibberish, except for a few stray
halfwit kana and one buck-naked 0x80.  Nor does EUC-JP come close to

I wonder if some 8-bit encoding (probably Shit-JIS) got somehow
reencoded via the UTF-8 algorithm?

 > Gmail is pretty good at conforming to mail standards.

Sure, but I think the Evolution display is required by the mail
standards.  So Gmail producing sanity looks mildly broken :-), but
probably by design (ie, Hotmal bug-compatibility).

 > > Yes, but "how" is a good question.  Hotmail is clearly incredibly
 > > broken, but webmail is such a complex application it's hard to guess
 > > exactly where the breakage is occurring.
 > I think the Firefox setting is irrelevant when using Hotmail.

That depends on content negotiation (or lack thereof) for the POST or
GET query, I should think.  I could imagine Hotmail requesting binary,
and Firefox using UTF-8 and shipping it to Hotmail as requested.  Of
course that begs the question of why Hotmail turned it into SGML
entities of the given values, etc.

