Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Firefox 3.0.1 doesn't respect <meta http-equiv="content-type">

Curt Sampson writes:
 > On 2008-09-12 18:10 +0900 (Fri), Stephen J. Turnbull wrote:
 > > Curt Sampson writes:
 > > 
 > >  > On 2008-09-12 15:43 +0900 (Fri), Stephen J. Turnbull wrote:
 > >  > 
 > >  > > But the META element *is* a header!!
 > >  > 
 > >  > It is patently not. It is part of the document,
 > > 
 > > Well, yes, but that document does contains a header used for supplying
 > > various kinds of metadata.  That's why the elements are called HEAD,
 > > and BODY!  This is just encapsulation of one protocol in another.
 > This is not "just encapsulation";

Puh-lease.  I'm talking about calling the content of the HEAD element
a "header".  That's just encapsulation, just like having a TCP header
inside (as part of the content of) an IP packet.  That TCP packet
itself might contain another TCP packet or even IP.  No?

 > if the inner protocol is modifying the interpretation of the outer
 > protocol, you have a mess on your hands.

Well, yes.  That's what MIME is all about, isn't it?  Ie, "I don't care
what it looks like, that's not ASCII text, that's a JPEG image!"  And
indeed, MIME is a mess.

However, don't you think you have it backwards?  Ie, "if the outer
protocol is modifying the interpretation of the inner protocol, you
have a mess".  In fact, I don't see how the inner protocol can modify
the interpretation of the outer protocol, since it doesn't know what
that protocol is.  By the time the inner protocol gets interpreted,
the outer protocol's meta information is gone.

 > Is actually *not* an attempt to reset the content type, but is
 > specifying (in a terribly unobvious way) the default character
 > encoding for the document.

I don't understand what you mean.  It's not an attempt to reset the
content type, nor does it have anything to do with "default" character
sets.  (This is associated with a couple of other metadata that are
defaults, but the charset is not a "default", it is *the* charset; it
can hardly be anything else!)  It is a specification of the MIME type,
which for text/* includes the character set.  Ch. 5 of the HTML
standard is quite clear about how this is to be interpreted:

1. if a Content-Type field is present in the actual HTTP header, it is
   to be obeyed, otherwise
2. if a META element with attribute HTTP-EQUIV="Content-Type" is
   present, it is to be obeyed, otherwise
3. if it's an external object with a charset parameter, that is
   obeyed, otherwise
4. something implementation-dependent is to be done, as the older
   recommendation of defaulting to ISO 8859/1 proved useless in

The logic for "real" HTTP fields taking precedence over a field
specified by HTTP-EQUIV is that the server may have done a charset
transcoding.  However, I think this is bogus.  It is in general not
safe to do a charset transcoding unless you know a fair amount about
the media type.  (Eg, you can't transcode things that are digitally
signed.)  If you know that much about the content (eg, that it's safe
to do this for text/html), then you know enough to go looking for an
HTTP-EQUIV header, and fix it up.

 > Presumably this:
 >  <META http-equiv="Content-Type" content="image/jpeg; charset=ISO-8859-5"> 
 > is also a valid way of doing it;

No, because image/* content types don't have a charset parameter.

 > it's hard to tell, since the semantics one would naively expect
 > from looking at this are not what it's actually specified to do.
 > That it's so easy to be confused by this stuff should be a good
 > indication that someone screwed up something really bad in the standard,
 > here.

No, it's no indication that someone screwed up *here* at all.
Something is screwed up *some*where, probably in several places.  But
it's hard to say that it's *right here*.  Standardizing this stuff is
really hard, not least because it's so political.

 > >  > Your example also is incorrect in this way; the HTML spec
 > >  > says nothing about MIME via SMTP
 > > 
 > > What's incorrect about an example of something that clearly is quite
 > > prevalent, if bloody annoying, as HTML email?
 > Well, first of all what's incorrect is you mentioning SMTP, as if it had
 > anything at all to do with this. It has nothing to do with this.

That was exactly my point.  Neither does HTTP, or at least IMO it
shouldn't.  It's just an outer protocol, and where the inner protocol
has (can have) a facility for doing so, it should be allowed to
specify its own content type IMO.

 > Second, now that I'm aware that <META http-equiv="Content-Type"> tags
 > are sometimes *not* anything to do with HTTP headers (or are they? who
 > really knows?) your example makes a bit more sense, except of course you
 > didn't even include any META tags. What on earth were you trying to say,
 > anyway?
 > >  > The client, it appears, should do nothing, and entirely ignore the META
 > >  > tags.
 > > 
 > > According to the standard, yes.
 > Nope. I was wrong. From the spec:
 >     META and default information
 >     The META element may be used to specify the default information for a
 >     document in the following instances:
 > 	* The default scripting language.
 > 	* The default style sheet language.
 > 	* The <<document character encoding>>.
 >     The following example specifies the <<character encoding>> for a
 >     document as being ISO-8859-5
 >     <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-5"> 
 > It's not stated what the relationship is between this use and the HTTP
 > header modification use. What a crock.

Not there.  The two places in the quoted spec where I have added << >>
brackets are links in the HTML standard; follow them and be enlighted.
I also provided the relevant link in at least one previous message;
what the hell, I'll give it again:

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links