Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Firefox 3.0.1 doesn't respect <meta http-equiv="content-type">

Curt Sampson writes:
 > On 2008-09-14 23:49 +0900 (Sun), Stephen J. Turnbull wrote:

 > > Puh-lease.  I'm talking about calling the content of the HEAD element
 > > a "header".  That's just encapsulation, just like having a TCP header
 > > inside (as part of the content of) an IP packet.  That TCP packet
 > > itself might contain another TCP packet or even IP.  No?
 > The one very important difference you ignore here is that in the TCP/IP
 > protocol suite, aside from perhaps a few things related to bootstrapping
 > protocols, there are no cases of encapsulated data changing the meaning
 > of data outside the encapsulation,

This is your brain fart as far as I can see.

How is the meaning of data outside the encapsulation changed?  The
HTTP header if it exists means what it means.  It can be true or
false, but the server has no way to enforce truth on the content
*unless it groks the content*, which it doesn't, and probably cannot
if it is to be reasonably efficient.

What the META element changes is the meaning of the HTML document data
*inside* the encapsulation.  This may falsify the HTTP header, but it
doesn't change its meaning, which is merely a guess in any case (as
the very existence of junk like AddDefaultCharset demonstrates).

 > or providing information for layers of encapsulation outside of
 > itself.

What information are you referring to?  The HTTP protocol transmits an
ASCII header followed by a binary message body; AFAIK it makes no use
of Content-Type, which is purely for the use of the inner protocol.

 > It doesn't make sense, since encapsulation is no longer a
 > transparent, reversable operation, and causes problems like the
 > ones we're seeing here.

No, what causes the problem is the fact that the outer protocol is
authoritative about the data contained in the inner protocol, but
implementations make no attempt to verify that what they claim is
true.  There's a good reason for that: it's hard to do.

That's one reason among several why the MIME RFCs and RFC 1036 (USENET
articles) are based off of RFC 822 (Internet message format), not off
of RFC 821 (SMTP protocol), RFC 997 (NNTP), etc.

The problem here is that HTTP is analogous to SMTP, not to RFC 2822
messages.  A web server is not in any better position to verify
content than a mail server.  Mail works quite well with content
delegated to MUAs; similarly, I strongly suspect the web would work
better with content delegated to some other agent than the httpd.

 > > In fact, I don't see how the inner protocol can modify the
 > > interpretation of the outer protocol, since it doesn't know what
 > > that protocol is.
 > Well, that's an issue, but without question META http-equiv tags
 > mess about with the interpretation of the HTTP headers, if present.

No, they don't.  The HTML 4.01 standard is quite specific about that.
Servers MAY grep for HTTP-EQUIV to inform the headers they send.
Clients MAY use HTTP-EQUIV to determine the charset ONLY IF there is
no Content-Type HTTP header field.

Even if we discuss the implicit proposal to change the precedence of
HTTP-EQUIV and the HTTP header, that doesn't change the interpretation
of the HTTP header.  It simply says that the HTTP header will not be
consulted in certain cases where "better" information is available.
However, if the HTTP header is consulted, the interpretation will be
the same.

 > > By the time the inner protocol gets interpreted,
 > > the outer protocol's meta information is gone.
 > Nope. The browser, when interpreting the HTML, remembers and uses
 > the information (such as the charset) from the headers.

OK, as written *in English* and *out of context* that's a sane
interpretation.  But it would be helpful if you remember that I know
that, and try to figure out if there's something else I might mean
that doesn't require assuming I'm totally clueless.

To clarify, I meant "stripped off and not available to the inner
protocol, which therefore can't be *modifying* the outer protocol,
but at most *superseding* it".

 > >  > Is actually *not* an attempt to reset the content type, but is
 > >  > specifying (in a terribly unobvious way) the default character
 > >  > encoding for the document.
 > > 
 > > I don't understand what you mean. It's not an attempt to reset the
 > > content type, nor does it have anything to do with "default" character
 > > sets.
 > Well, sorry, you may say it doesn't, but the spec says it does:
 >     The META element may be used to specify the default information for
 >     a document in the following instances:
 > 	* The default scripting language.
 > 	* The default style sheet language.
 > 	* The document character encoding.

The fact that the scripting language and style sheet language are
explicitly described as "default", and document character encoding is
not, clearly indicates that the "default" in the first paragraph
really means something like "document scope", not "default".  There is
only one "document scope" charset; it is not a "default"; cf. RFC
2046.  The standard clearly intends that "the document character
encoding" refer to this document-scope explicit charset, not a
"default" of some kind.

And the standard doesn't say "reset" anywhere I can see, nor is
"reset" a sensible interpretation in light of the priorities mandated
in Ch.5.

 > >  > Presumably this:
 > >  > 
 > >  >  <META http-equiv="Content-Type" content="image/jpeg; charset=ISO-8859-5"> 
 > >  > 
 > >  > is also a valid way of doing it;
 > > 
 > > No, because image/* content types don't have a charset parameter.
 > Right, but there's no indication in the spec that it should not be
 > interpreted that way.

There doesn't need to be.  That's what RFC 2046 is for.

 >     A META declaration with "http-equiv" set to "Content-Type" and a
 >     value set for "charset".

But this MUST be ignored (cf. RFC 2046) if it is not known by the
implementation to be a valid parameter for the content-type/subtype in
question (ie, separate namespaces).  And since the HTML spec doesn't
define image/jpeg, it can't add a charset parameter ad hoc.

I'm sorry, Curt, but you must read RFCs as if they were written in
Japanese, with other RFCs constituting the relevant "j├┤shiki".

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links