Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Firefox 3.0.1 doesn't respect <meta http-equiv="content-type">



Curt Sampson writes:
 > On 2008-09-17 11:41 +0900 (Wed), Stephen J. Turnbull wrote:

 > >  > Rubbish. What "meaning of HTML document data" is this changing?
 > >  > 
 > >  >     <META http-equiv="Age" content="12">
 > > 
 > > Nobody's talking about any header other than Content-Type, here.
 > 
 > Excuse me, but that is *exactly* what I am talking about.

If this is what you call "exact" ... pulling random stuff out of the
air ... raaaiiight.

 > > You yourself have argued (and Edward has echoed) that asking the httpd
 > > to verify the claims in its Content-Type is too burdensome.  So the
 > > HTTP Content-Type header field is (in actual practice, and supported
 > > by theory) at best a hint.
 > 
 > So your claim is that, if I send an HTTP page as type "image/jpeg", the
 > browser should, rather than saying "this is a broken JPEG", dig around
 > and try to determine the content-type automatically?

Where do you get that dope you're smoking?  Of course that's not my
claim.

No, the standard says we look at the Content-Type.  Since JPEG is a
medium that provides no mechanism for labelling what it is if it isn't
JPEG, so if the server says "this is image/jpeg" and JPEG doesn't
work, the browser should say "I don't know what this is, what do you
want me to do?  I advise you to think carefully, because the server is
either lying or crazy."  And that's what the standard *should* say,
for image/* and similar, IMO.

And furthermore, *given the standard as it is*, we should do exactly
the same thing with text/html.  However, my claim is that that's a bad
standard, because servers are known to be unreliable and to have
common features that decrease reliability, and because they are in a
bad position to try to enforce accuracy.  Rather, something that
actually looks at the content should do this.  Composing tools,
offline website constructors, dynamic content generators, etc.  Not
the httpd.

 > This is probably in violation of the standard,

If you don't already know that it is not in conformance to the
standards, at the very least for the case of "text/html;
charset=utf-8", then you haven't read either the standards or my
recent posts with any care for "exactness" whatsoever.

 > Basically, program your HTTP server to return correct content types, or
 > live with the consequences.

If that could be done, I'd be happy.  So tell me, how do I stop Apache
from lying about charsets?

But what you really mean is that I should go through my entire website
and check all the files to make sure that none of them will confuse
the HTTP server.  That's what I will do eventually, but in the
meantime you cannot help me stop the httpd from lying, can you?

 > > If you don't want to deal with the inner protocol, don't use the
 > > Content-Type header.
 > 
 > And this is allowable in the standard? Just leave out "Content-type" and
 > have the client guess?

Sure.  The standard says you *should* provide a Content-Type header.
The only times you *must* provide the Content-Type are when the
response is actually part of the HTTP protocol (or an extension such
as WebDAV).

 > > I think it's much more sensible to *define* the outer Content-Type as
 > > a hint....
 > 
 > I think that's a very, very bad thing unless you're very specific about
 > exactly what may be encapsulated, and it's self-identifying.

First, by "hint" I mean "non-authoritative, may be superseded by
'better' information".  If there's no better information, it's a very
bad idea to disregard the hint.

Second, we're talking about text/html here.  You may have decided it's
something else, but that's what I've been talking about the whole
time.  If the server says "text/*", then the client has a huge amount
of latitude in any case, because of the imprecise nature of text.

 > Again, guessing at content interpretation on the part of the client
 > has been a source of security problems and exploits many times in
 > the past.

Who's talking about "guessing"?  I'm definitely not!  I'm talking
about using additional information *designed for that purpose* that
may be available.

For example, one could take the HTTP Content-Type as primary, and if
the document looks "odd" for that media type, you could then take the
HTTP-EQUIV Content-Type, and if that is well-formed, you could check
if that fits better.

 > >  > In the end, having a Content-type delivered by HTTP for whatever
 > >  > it's encapsulating makes as much sense as a Windows box interpreting
 > >  > any file ending in ".jpg" as a JPEG file,
 > > 
 > > Yeah, and we know how much havoc that has caused. :-(
 > 
 > Just what havoc, compared to not doing so?

Hm?  Assuming that a .jpg is a JPEG file, so that Outhouse will pass
it to Exploder, which then looks at the magic and determines it's an
.exe, which it then runs because Outhouse vouches for it, is a classic
vector for exploiting Windows.  Isn't that what you were talking
about?

 > It is, indeed. If the encapsulation says, "this chunk of data is
 > image/jpeg", and you chose to ignore that based on guessing at content,

You are seriously confused here.  Nobody is talking about guessing,

 > then it is indeed modifying the interpretation of the document from
 > what the encpsulating protocol requested, which is what I meant.

and interpretation is not protocol data.  Interpretation is the
prerogative of the client (or should be, RFC 2616 seems to think
otherwise).  Would you claim that providing a View Source function is
a violation of the standard because it disregards the "text/html"
Content-Type?

 > > Obviously I believe that the right thing would be to amend the HTML
 > > standard to give precedence to the document.
 > 
 > That would also involve ammending the HTTP standard.

Unfortunately, it would appear so, as RFC 2616 specifies the behavior
of the recipient while failing to demand that the server give accurate
information.

 > That's far different from the general case of the situation now.

Well, sure, I've never claimed anything different, and have asserted
three or four times now that Firefox's behavior *is* conformant to the
standards and that is *desirable* (correct, whatever) behavior given
the standards as they are.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links