Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Why am I not seeing Japanese in my web page on my Android? [NOT SOLVED]



Darren Cook writes:

 > A meta-tag should override the http header.

It maybe *should*, but it *don't*.  "IETF 1, web developers 0".<wink/>
This is a major gotcha of HTTP/1.1: if there is a charset parameter in
the Content-Type header of text/* content, it MUST be respected.  META
elements are not allowed to override it.

>From the definition of HTML 4
(http://www.w3.org/TR/html4/charset.html#idx-HTTP):

  To sum up, conforming user agents must observe the following
  priorities when determining a document's character encoding (from
  highest priority to lowest):

    An HTTP "charset" parameter in a "Content-Type" field.
    A META declaration with "http-equiv" set to "Content-Type" and a
      value set for "charset".
    The charset attribute set on an element that designates an
      external resource.

This is due to (what is arguably) a screwup in RFC 2616, which
mandates that the charset of text/* media, if specified in
Content-Type, MUST be respected on first rendering even if the user
says otherwise.  (There are good reasons for this, the argument is
whether they're good *enough* to justify such an unintuitive
precedence. :-)

It's possible that HTML 5 has changed this; I don't know HTML 5 yet
though, so I can't say.  I imagine you'll need a DTD declaration to
get HTML 5, though.

 > So, at the top of your PHP script that processes the ajax request, try
 > adding:
 >   header('Content-Type: application/json; charset=UTF-8');

I'd be surprised if that works, because the application/json media
type doesn't define the charset parameter.  RFC 4627 doesn't even
mention the word "charset", and specifies that JSON content is always
encoded in Unicode of one of 5 UTFs (UTF-8 and the endian variants of
UTF-16 and UTF-32).  Which one can always be deduced from conforming
JSON content, so no charset parameter *or* signature/BOM is needed.

*****

I suspect that the underlying problem is either that the HTTP header
of the main HTML document has a bogus charset parameter (such as
"shift_jis" or "ISO-8859-1"), or that *both* the HTTP header *and* the
<META HTTP-EQUIV="Content-Type"> element are missing, so that RFC 2616
requires that the document's charset be set to ISO-8859-1.  HTML 4
recommends that this requirement be *ignored*, so that there is *no*
default character set if both are missing.  In other words, absent an
explicit setting for the charset parameter either in Content-Type or
in a META element, for an HTML 4 document, the browser can do whatever
it wants until the user tells it what to do (good idea, that one, W3C! 
or should I say, "W3C 1, web developers 0"?<wink/>)



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links