Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Re: font/char set question
- Date: Mon, 30 Jul 2007 13:43:45 +0900 (JST)
- From: Curt Sampson <cjs@example.com>
- Subject: Re: [tlug] Re: font/char set question
- References: <5634e9210707282051g6d4ac8b9l1ba725231bdff464@mail.gmail.com> <d8fcc0800707290802x2c9798dj411fc5400e8b8d6f@mail.gmail.com> <46AD1954.1080209@dcook.org> <d8fcc0800707291946s531f3353y8e0124d8e12cb071@mail.gmail.com>
On Mon, 30 Jul 2007, Josh Glover wrote:
I would imagine that most Japanese keitai browsers these days can handle displaying UTF-8 on web pages (but *not* in email!)
In fact, for DoCoMo you have that backwards. They will handle UTF-8 e-mail just fine, but will not handle anything but Shift_JIS for web pages.
AU and Softbank support both Shift_JIS and UTF-8 for e-mail and web pages.
There may be certain models of phone for which the above is not true, but in general, conversion seems to be done at the gateway.
...but I am less confident in their ability to input it in forms and such.
Every browser I've seen a) does not tell you in what encoding it's submitting a form, and b) submits forms in the encoding of the web page from which the form was taken. (Note that I've not thoroughly investigated point a); the mere presence of browsers that don't tell you is enough that I might as well assume that they're all like that.)
Here is the method I've developed over the past seven years or so for dealing with I18N on web sites.
1. Unless you know otherwise, generate the page in UTF-8, since that will deal with situations such as a Chinese comment on a Japanese page, or a Chinese person typing a search query into a Japanese page.
2. Convert to a more restrictive encoding for those browsers where you know you need to do this. For example, convert to Shift_JIS for DoCoMo phones. Ideally, you'll use a converter that will deal in a nice way with code points that don't exist in the target encoding. But to do this in a truly user-friendly way is a heck of a lot more work than just letting iconv put in question marks for the characters.
3. Set both a Content-type HTTP header and include an identical meta http-equiv tag in the header, both specifying the encoding. The HTTP header will override the value in the meta tag, if they're different and the header is present. The header will generally not be present if someone's using a page they'd previously saved to disk, which is why you need the meta tag as well.
4. When generating a page with forms, include the encoding as a (hidden) parameter in the form. When you receive a form, do no conversion until you check this parameter, and then convert all parameters from that encoding to UTF-8 before proceeding further.
This combination of putting the encoding into both the content-type headers and the form is the only way I know of to reliably determine the encoding of a submitted form on a wide array of browsers.
You may also need to use this technique on URLs if you have multi-lingual ones, but I generally try to avoid those, because I like to include only information relevant to the user in a URL, if I can manage that.
Japanese keitai browsers are pretty damned primitive. Here are some of the things that they do not support...
You're rather behind the times here. :-) As of two years ago, many AU and Softbank phones supported cookies, and moderns DoCoMo phones support CSS and tables, though I've not yet investigated the extent of that support or which models support it. I'm not sure what you mean by "animations except for Flash," but all phones have supported animated GIFs and/or PNGs for ages.
Personally, I think that the Flash stuff is extremely cool, since it offers the closest thing to "Web 2.0" that you're going to get on a phone, but then again, that I sell a product to help you do this may bias me somewhat.
That is probably true, but think of all the embedded devices in Japan with web browsers. In order for our site to *never* display mojibake to 99.999999999999% of our Japanese customers, it is in Shit_JIS.
Well, you could do the reverse of what I do, and display UTF-8 where you know it will work, and Shift_JIS otherwise.
Personally, I was somewhat suprised when I first came to Japan that all of the Amazons didn't Just Work in all of the languages and encodings. But then again, rewriting your way into that sort of thing can be extremely difficult if you're not working from a very well refactored code base with a lot of automated testing. I find that usually by far the biggest and most expensive job in any legacy system is merging and generalizing duplicated functionality. Rewriting from scratch, especially in systems not designed for automated testing, is often cheaper. The big problem with the rewrite is that there's often a lot of experience embedded in the legacy system that's very, very difficult to extract.
cjs -- Curt Sampson <cjs@example.com> +81 90 7737 2974 Mobile sites and software consulting: http://www.starling-software.com
- Follow-Ups:
- Re: [tlug] Re: font/char set question
- From: Godwin Stewart
- Re: [tlug] Re: font/char set question
- From: Darren Cook
- References:
- [tlug] Re: font/char set question
- From: Jim Breen
- Re: [tlug] Re: font/char set question
- From: Josh Glover
- Re: [tlug] Re: font/char set question
- From: Darren Cook
- Re: [tlug] Re: font/char set question
- From: Josh Glover
Home | Main Index | Thread Index
- Prev by Date: Re: font/char set question: Chinese Encodings: GB 18030? . . . . . . . [tlug]
- Next by Date: Re: [tlug] [OT] Good IT Resume
- Previous by thread: Re: font/char set question: keitai: non-support of stuff is a feature . . . . . . . . [tlug]
- Next by thread: Re: [tlug] Re: font/char set question
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links