
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Why am I not seeing Japanese in my web page on my Android? [NOT SOLVED]
Darren Cook writes:
> A meta-tag should override the http header.
It maybe *should*, but it *don't*. "IETF 1, web developers 0".<wink/>
This is a major gotcha of HTTP/1.1: if there is a charset parameter in
the Content-Type header of text/* content, it MUST be respected. META
elements are not allowed to override it.
>From the definition of HTML 4
(http://www.w3.org/TR/html4/charset.html#idx-HTTP):
To sum up, conforming user agents must observe the following
priorities when determining a document's character encoding (from
highest priority to lowest):
An HTTP "charset" parameter in a "Content-Type" field.
A META declaration with "http-equiv" set to "Content-Type" and a
value set for "charset".
The charset attribute set on an element that designates an
external resource.
This is due to (what is arguably) a screwup in RFC 2616, which
mandates that the charset of text/* media, if specified in
Content-Type, MUST be respected on first rendering even if the user
says otherwise. (There are good reasons for this, the argument is
whether they're good *enough* to justify such an unintuitive
precedence. :-)
It's possible that HTML 5 has changed this; I don't know HTML 5 yet
though, so I can't say. I imagine you'll need a DTD declaration to
get HTML 5, though.
> So, at the top of your PHP script that processes the ajax request, try
> adding:
> header('Content-Type: application/json; charset=UTF-8');
I'd be surprised if that works, because the application/json media
type doesn't define the charset parameter. RFC 4627 doesn't even
mention the word "charset", and specifies that JSON content is always
encoded in Unicode of one of 5 UTFs (UTF-8 and the endian variants of
UTF-16 and UTF-32). Which one can always be deduced from conforming
JSON content, so no charset parameter *or* signature/BOM is needed.
*****
I suspect that the underlying problem is either that the HTTP header
of the main HTML document has a bogus charset parameter (such as
"shift_jis" or "ISO-8859-1"), or that *both* the HTTP header *and* the
<META HTTP-EQUIV="Content-Type"> element are missing, so that RFC 2616
requires that the document's charset be set to ISO-8859-1. HTML 4
recommends that this requirement be *ignored*, so that there is *no*
default character set if both are missing. In other words, absent an
explicit setting for the charset parameter either in Content-Type or
in a META element, for an HTML 4 document, the browser can do whatever
it wants until the user tells it what to do (good idea, that one, W3C!
or should I say, "W3C 1, web developers 0"?<wink/>)
Home |
Main Index |
Thread Index