Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Web pages & Jp. text -THANKS ALL



>>>>> "Michael" == Michael Schubart <michael@example.com> writes:

    Michael> Am Sun, 27 Aug 2000 schrieb Jonathan Q:
    >> Stephen Lee (sl@example.com) wrote:

    Stephen> You should put in the metatag anyway.  People like me
    Stephen> switch between

    >> Yes, you certainly should.  Charset information is not some
    >> optional nicety, it is *mandatory* in the HTML specification
    >> and has been for a very long time. It is mandatory for any
    >> charset, including iso8959-1.

Yeah, since 26 January 2000, when XHTML 1.0 became a W3C
Recommendation.  Even in Web years, 2000 - 2000 = "0" is a small
number, Jon-Jon.

I can't resist noting that there is a small out from "mandatory" even
for XHTML (see below).

Be that as it may, Jonathan is right.  My own reaction to failure to
recognize character encoding on the part of any of the more sensible
browsers is "echo $URL >> ~/.junkbusterrc".  There is no excuse for
this on a well-run[1] Apache-based server capable of active pages;
autorecognition for most languages is quite cheap, even on-line, and
can be done offline if you really need the cycles.

To put it more strongly, I recommend that all webmasters (and sys
admins for systems hosting web servers, even if you're not webmaster
qua webmaster) read the HTTP, HTML 4, and XHTML specifications at
http://www.w3.org/.  Soon:  the face you save will be your own.  It
doesn't take that long to browse through them; you can ignore anything
you don't understand yet.  It will be there in the back of your mind,
and it will "omoideru" just in time to save your butt some day.

Think about it this way:  if you don't have enough time to do it
today, then someday, perhaps soon, your boss will give you a nice pink
"invitation to advanced study."  :-|

    Michael> Well, this is funny, then: When I check my page at
    Michael> www.schubart.net with validator.w3.org, I get

    Michael>   Congratulations, this document validates as HTML 4.0
    Michael> Strict!

    Michael> even though there is no char set info in it. Is
    Michael> validator's output misleading? And I thought I'd done
    Michael> everything right...

Charset information is not mandatory in HTML 4.01.  The word "should"
is used liberally in the standard, and although it calls the
specification of ISO 8859-1 as default "useless", it doesn't prohibit
servers from assuming it, only user agents ;-).  HTML 4 really cannot
mandate charset information, as it maintains the confoundance of
charset information provided in the HTTP headers with that of the
document itself.  I imagine it is exactly this kind of design flaw
that led the W3C to conclude that a Great Leap Forward to XHTML was
necessary.

Even XHTML does not _require_ charset information.  You are allowed to
default to UTF-8 or UTF-16 by omitting the xml declaration.  (But
Jon-Jon is right about ISO 8859-1.)

The <META HTTP-EQUIV=...> tag is deprecated, and in fact is not
available in cases where a non-ASCII-compatible charset (eg, wide-char
Unicode, taking the standard literally: most ASCII NULs in a Unicode
byte stream don't stand for themselves) is used.  It is preferable
that the server itself emit appropriate Content-Type headers.



Footnotes: 
[1]  My own is not, but then, for practical purposes I am my own boss.

-- 
University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
_________________  _________________  _________________  _________________
What are those straight lines for?  "XEmacs rules."


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links