Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: font/char set question: Chinese Amazon charset . . . . . . . [tlug]



Josh Glover writes:

 > That does strike me as odd. Could it be that there is one Chinese
 > encoding that is assumed to be used?

Not by standard.  From RFC 2616, Section 3.7.1:

   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.

In China, you can in practice assume GB 2312, which the Chinese
government is somewhat fanatical about.  They even have their own
16-bit version of "Unicode" which grandfathers GB 2312 in a similar
way to ASCII in UTF-8.  (Unicode in quotes because the mapping
obviously can't be fully preserved.)

 > Seems like it attempts to auto-detect charset on a per-UserAgent
 > basis, would you concur? Is this in an RFC somewhere?

The Vary header is an instruction to caching proxies, and not relevant
here.  Op cit, section 14.44.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links