Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Firefox 3.0.1 doesn't respect <meta http-equiv="content-type">

Stephen J. Turnbull wrote:
Edward Middleton writes:
> Stephen J. Turnbull wrote:
> > Edward Middleton writes:
> >
> > > What I presume Curt is saying is that adding descriptive metadata > > > necessary to reliably read the contents of a file, in an encoding > > > specific to the file, is inherently stupid because you have to already > > > have this information in order to read the encoded metadata.
> > > that's factually incorrect in several ways, and generally wrong-headed.
> > Care to explain the several things you see as being factually incorrect, > and why you think this is wrong-headed?

The most obviously contradicted fact is that the standard specifies
that when the META element is used, the file's encoding should be
ASCII-compatible. That's all you need. Note that the XML standard
does exactly the same thing, except for making a hard requirement of a
much more useful ASCII-compatible default (ie, UTF-8). It's
interesting that somebody is claiming that this is "inherently stupid"
when a couple of decades of experience with similar systems has led to
continual refinement and strengthening of this specification.

Well it works accept in the corner cases like text files on a website or source code where there isn't an in-band (i.e. in file) means of indicating the encoding. Out-of-band(e.g. in headers) methods like http headers support all the previously mentioned cases.

In your above example you already know the file is ASCII-compatiable HTML which is sufficient to read the encoding information. This information is provided out-of-band not in the file.

For XML the specs explaining this situation is here[1] and they state

"XML encoding declaration allows reasonably reliable in-band labeling of character encodings"

Not exactly a ringing endorsement but as long as no-one comes up with an encoding that breaks it things should mostly work.

Second fact, whose contradiction is implied, neither I nor the
standards claimed that this metadata is *necessary* to "reliably" read
the contents of a file. What we claim is that the META element *may*
be *useful* to the server in analyzing the contents of the file. In
particular, on my server, it is inherently more reliable than an
Apache AddDefaultCharset (because multiple encodings are known to be
in use, any default must be incorrect for some files).

By setting AddDefaultCharset to UTF-8 your web administrator (presumably you) explicitly told your apache webserver to assume all HTML files are encoded using the UTF-8 charset, unless told otherwise (and it apparently doesn't consider meta tags). This is an apache configuration issue and has nothing to do with merits of in-band out-of-band metadata.

Even in apache it is possible to set per file charset information for all files, the problem is where to store this extra data. Subversion over web_dav will let you do this and I believe there is a module allowing you to set it using file extensions.

Writing an apache module that used your HTML META data to correctly set your headers would have also resolved your problem.

What is "inherently dumb" is unnecessarily coupling the web server to the type of files it serves.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links