Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][tlug] Re: WWW page charsets (was: font encoding question)
- Date: Mon, 18 Jun 2007 18:22:46 +1000
- From: "Jim Breen" <jimbreen@example.com>
- Subject: [tlug] Re: WWW page charsets (was: font encoding question)
[Changed the Subject, as this has nothing to do with *font* encoding (whatever that may be...)]
steve smith <sjs@example.com> wrote:Brian Chandler wrote: > steven smith wrote:
> Actually I believe it is simpler than this. If you have a webpage > encoded in UTF-8, you can (*) assume that the browser will return form > input values in the same encoding.
Are you sure?
It's the default, and works almost all the time. The only times I have encountered problems with that assumption were (a) in the ancient Mac version of Netscape 2, where it was assumed that all Japanese pages were only in Shift_JIS, and mojibaked everything else, and (b) the lite browser in DoCoMo keitais, which also only allows Shift_JIS.
From the discussion that's been going on in the "WWWJDIC backdoor issue" thread, I'm not sure it's that simple.
That discussion was actually about the innards of browser add-ons. The problems that triggered that thread don't really involve the charsets used in regular forms.
This isn't quite the same since I'll be serving a form, but... somehow assuming always seems to get me in trouble. If someone can verify that a page will return text in the font it has been encoded in, I'd be delighted.
It's the usual behaviour. WWWJDIC works that way, and gets several million uses a week. I've never had a complaint on that score.
I'm still trying to wrap my mind around font-encoding and the issues involved.
You'll get your head a bit straighter not calling them fonts. They are characters, and we are talking about character sets (e.g. JIS X 0208 or Unicode) and character encoding/encapsulation systems (Shift_JIS, UTF8, etc.) In WWW/Internet-speak, these are often conflated into "charset".
Fonts, e.g. Mincho, Arial, Helvetica, etc. are different things.
In any event, in the "backdoor" thread Stephen Turnbull pointed out this link: http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset so assuming that the browser pays attention, I plan on using both your suggestion, Stephen's, and then praying for the best :)
By all means use the "accept-charset" in the <FORM ...>, but be prepared for some browsers ignoring it. Be safe, and assume the text is being sent to the server in the coding/charset of the page in which the form is embedded.
Like they said in that thread, one standard would be nice (though the context was a bit different). I have to make sure everything I send to friends in Japan is in ISO-2022 or they see 文字化け,
Correct.
but most of the rest of the world (and I think this list usually) is utf-8.
Not so, although Unicode/UTF8 is getting more common. I email in ISO-2022-JP (or more correctly, I ask Gmail to use the default charset for the text I am sending, and the email default fo Japanese is ISO-2022-JP.)
And then there's ISO-8859.
Yes, ISO-8859-1 is the default for WWW pages.
For me, font encoding often seems to do the unexpected. Augh...
The character coding is exactly what you ask it to be. If something unexpected pops up, you probably asked for the wrong thing.
Cheers
Jim -- Jim Breen Honorary Senior Research Fellow Clayton School of Information Technology, Monash University, VIC 3800, Australia http://www.csse.monash.edu.au/~jwb/
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] RE: Tlug Digest, Vol 18, Issue 36
- Next by Date: [tlug] Emacs lisp: setting font faces
- Previous by thread: Re: [tlug] RE: Tlug Digest, Vol 18, Issue 36
- Next by thread: [tlug] Emacs lisp: setting font faces
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links