Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Firefox 3.0.1 doesn't respect <meta http-equiv="content-type">



Curt Sampson writes:

 > On 2008-09-16 13:48 +0900 (Tue), Stephen J. Turnbull wrote:
 > 
 > > How is the meaning of data outside the encapsulation changed?  The
 > > HTTP header if it exists means what it means.
 > 
 > In other words, something possibly encapsulated inside an HTTP message
 > unit is attempting to change things about the HTTP message unit that may
 > or may not be encapsulating it. This is wrong.

What do you claim it is attempting to change?  The encapsulating data
is not available to the encapsulated object; it cannot change that.
The encapsulated data is what it is, nothing changes there.  What else
is left to change?

You yourself have argued (and Edward has echoed) that asking the httpd
to verify the claims in its Content-Type is too burdensome.  So the
HTTP Content-Type header field is (in actual practice, and supported
by theory) at best a hint.

What's wrong is making hints authoritative.

 > > It can be true or false, but the server has no way to enforce truth
 > > on the content *unless it groks the content*, which it doesn't, and
 > > probably cannot if it is to be reasonably efficient.
 > 
 > Fair enough; you can argue that the content-type header should not be
 > part of the HTTP standard.

Of course I don't; it's useful in many cases, including binary content
that doesn't support MIME headers.  But with its current authoritative
definition, it should only be used when the server or its management
is prepared to enforce it, including ensuring that any inner protocol
is adjusted to reality (remember, the rationale for making the HTTP
header authoritative is that the httpd might be changing the content!)
If you don't want to deal with the inner protocol, don't use the
Content-Type header.

I think it's much more sensible to *define* the outer Content-Type as
a hint, and if the management wants to enforce it, they'll have to
either control their content carefully or get a server capable of
rewriting the inner Content-Type (which would make sense in case of a
server capable of transcoding documents for the client's convenience).

 > > What the META element changes is the meaning of the HTML document data
 > > *inside* the encapsulation.
 > 
 > Rubbish. What "meaning of HTML document data" is this changing?
 > 
 >     <META http-equiv="Age" content="12">

Nobody's talking about any header other than Content-Type, here.  No
general claim has been made that META elements should be considered
authoritative.  But if you want to talk about the Age header, sure.
Obviously since it refers to server internals (eg, caches and
generated content) the server should be authoritative.

 > In the end, having a Content-type delivered by HTTP for whatever
 > it's encapsulating makes as much sense as a Windows box interpreting
 > any file ending in ".jpg" as a JPEG file,

Yeah, and we know how much havoc that has caused. :-(

 > and a Mac interpreting any file whose resource fork says it's a
 > JPEG file as a JPEG file. (Well strictly, JFIF files, but
 > whatever.) If you're railing against all file file formats not
 > being self-identifying, well, you may have a point.  But insisting
 > that it's reasonable for some data to attempt modify information
 > specified by a protocol unit encapsulating it is a path towards
 > insanity,

It's not modifying that information, and cannot, as you yourself
pointed out.  (By noting that the browser will store the content of
the HTTP header Content-Type field.  What, are you now claiming that
it *must* forget that information just because the HTML HEAD contains
a META with HTTP-EQUIV="Content-Type"?  Surely you can devise a data
type to handle both!)  It's providing additional information.

If you (ie, the client program) have multiple sources of information,
you have to decide which are most reliable and how to compromise among
them.  I know that such programs are hard to write, and annoying to
programmers.  So?  It's also hard to manage content.  If the content
managers are likely to make mistakes, it may be useful for the content
delivery system to double-check.  That's something that needs to be
determined empirically, it could go either way in principle.

Obviously I believe that the right thing would be to amend the HTML
standard to give precedence to the document.  However, I could be
convinced by evidence showing that erroneous Content-Type is more
likely/harder to detect and correct in that case than in the current
situation with AddDefaultCharset ready to produce mojibake at a
moment's notice.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links