Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Web pages & Jp. text -THANKS ALL



Am Mon, 28 Aug 2000 schrieben Sie:
> Michael Schubart (michael@example.com) wrote:
>
> Michael> Well, this is funny, then: When I check my page at www.schubart.net 
> Michael> with validator.w3.org, I get
> Michael>
> Michael>   Congratulations, this document validates as HTML 4.0 Strict!
>
> Is your http server sending the charset?  It may follow the standard on
> that and send it as the default.  This is from the very useful page at
> http://www.w3c.org/TR/html4/charset.html:
>
> The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a
> default character encoding when the "charset" parameter is absent from the
> "Content-Type" header field. In practice, this recommendation has proved
> useless because some servers don't allow a "charset" parameter to be sent,
> and others may not be configured to send the parameter. Therefore, user
> agents must not assume any default value for the "charset" parameter.

The web server sends only

  Content-Type: text/html

I get the "Congratulations" message from the validator regardless of
whether I enter the URL or upload the file. Maybe I should tell 
Gerald Oskoboiny from validator.w3.org about this, then.

> Michael> even though there is no char set info in it. Is validator's
> Michael> output misleading? And I thought I'd done everything right...
>
> In further digging, I ran across another interesting passage.  I find
> it particularly interesting because I distinctly remember seeing
> documents get flagged on encoding  circa 1997/1998 when HTML 3.2 was
> in wide use and most people (myself included) were using the HTML 4.0
> Transitional DTD.  However, neither the one you cited nor the one at
> http://www.htmlhelp.com seem to do this now.   Anyway, on to the
> interesting passage:
>
> The optional parameter "charset" refers to the character encoding used
> to represent the HTML document as a sequence of bytes. Legal values
> for this parameter are defined in the section on character encodings.
> Although this parameter is optional, we recommend that it always be
> present.
>
> This is from: http://www.w3.org/TR/html4/conform.html
>
> So here we find the important statement "... we recommend that it always
> be present," and validators are not squawking about charset.  Well, they
> squawk a little.  They'll mention "charset unknown" at the top.  Then
> they barf on non-ascii characters, popping errors on all the kanji they
> run across  :-)  For a good demo of this, run a validator over
> http://www.goo.ne.jp/ and take a look.  There't nothing special about
> that site; I just chose it at random, figuring it would probably have no
> charset encoding (you'd think that wouldn't be such a problem in Japan,
> wouldn't you?) and I was right.
>
> Adrian, if you're lurking out there, how about weighing in on the
> charset issue?

You sent you message only to me, not to the list. I hope it's OK
to answer to the list.

> Jonathan

-----------------------------------------------------------------
Michael Schubart                             michael@example.com
-----------------------------------------------------------------


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links