Re: [tlug] Re: font/char set question

Date: Mon, 30 Jul 2007 13:43:45 +0900 (JST)
From: Curt Sampson <cjs@example.com>
Subject: Re: [tlug] Re: font/char set question
References: <5634e9210707282051g6d4ac8b9l1ba725231bdff464@mail.gmail.com> <d8fcc0800707290802x2c9798dj411fc5400e8b8d6f@mail.gmail.com> <46AD1954.1080209@dcook.org> <d8fcc0800707291946s531f3353y8e0124d8e12cb071@mail.gmail.com>

On Mon, 30 Jul 2007, Josh Glover wrote:

I would imagine that most Japanese keitai browsers these days can
handle displaying UTF-8 on web pages (but *not* in email!)


In fact, for DoCoMo you have that backwards. They will handle UTF-8
e-mail just fine, but will not handle anything but Shift_JIS for web
pages.

AU and Softbank support both Shift_JIS and UTF-8 for e-mail and web
pages.

There may be certain models of phone for which the above is not true,
but in general, conversion seems to be done at the gateway.

...but I am
less confident in their ability to input it in forms and such.


Every browser I've seen a) does not tell you in what encoding it's
submitting a form, and b) submits forms in the encoding of the web
page from which the form was taken. (Note that I've not thoroughly
investigated point a); the mere presence of browsers that don't tell you
is enough that I might as well assume that they're all like that.)

Here is the method I've developed over the past seven years or so for
dealing with I18N on web sites.

1. Unless you know otherwise, generate the page in UTF-8, since that
will deal with situations such as a Chinese comment on a Japanese page,
or a Chinese person typing a search query into a Japanese page.

2. Convert to a more restrictive encoding for those browsers where you
know you need to do this. For example, convert to Shift_JIS for DoCoMo
phones. Ideally, you'll use a converter that will deal in a nice way
with code points that don't exist in the target encoding. But to do this
in a truly user-friendly way is a heck of a lot more work than just
letting iconv put in question marks for the characters.

3. Set both a Content-type HTTP header and include an identical meta
http-equiv tag in the header, both specifying the encoding. The HTTP
header will override the value in the meta tag, if they're different
and the header is present. The header will generally not be present if
someone's using a page they'd previously saved to disk, which is why you
need the meta tag as well.

4. When generating a page with forms, include the encoding as a (hidden)
parameter in the form. When you receive a form, do no conversion until
you check this parameter, and then convert all parameters from that
encoding to UTF-8 before proceeding further.

This combination of putting the encoding into both the content-type
headers and the form is the only way I know of to reliably determine the
encoding of a submitted form on a wide array of browsers.

You may also need to use this technique on URLs if you have multi-lingual
ones, but I generally try to avoid those, because I like to include only
information relevant to the user in a URL, if I can manage that.

Japanese keitai browsers are pretty damned primitive. Here are some of
the things that they do not support...


You're rather behind the times here. :-) As of two years ago, many
AU and Softbank phones supported cookies, and moderns DoCoMo phones
support CSS and tables, though I've not yet investigated the extent of
that support or which models support it. I'm not sure what you mean by
"animations except for Flash," but all phones have supported animated
GIFs and/or PNGs for ages.

Personally, I think that the Flash stuff is extremely cool, since it
offers the closest thing to "Web 2.0" that you're going to get on a
phone, but then again, that I sell a product to help you do this may
bias me somewhat.

That is probably true, but think of all the embedded devices in Japan
with web browsers. In order for our site to *never* display mojibake
to 99.999999999999% of our Japanese customers, it is in Shit_JIS.


Well, you could do the reverse of what I do, and display UTF-8 where you
know it will work, and Shift_JIS otherwise.

Personally, I was somewhat suprised when I first came to Japan that all
of the Amazons didn't Just Work in all of the languages and encodings.
But then again, rewriting your way into that sort of thing can be
extremely difficult if you're not working from a very well refactored
code base with a lot of automated testing. I find that usually by far
the biggest and most expensive job in any legacy system is merging
and generalizing duplicated functionality. Rewriting from scratch,
especially in systems not designed for automated testing, is often
cheaper. The big problem with the rewrite is that there's often a lot of
experience embedded in the legacy system that's very, very difficult to
extract.

cjs
--
Curt Sampson       <cjs@example.com>        +81 90 7737 2974
Mobile sites and software consulting: http://www.starling-software.com

Follow-Ups:
- Re: [tlug] Re: font/char set question
  - From: Godwin Stewart
- Re: [tlug] Re: font/char set question
  - From: Darren Cook

References:
- [tlug] Re: font/char set question
  - From: Jim Breen
- Re: [tlug] Re: font/char set question
  - From: Josh Glover
- Re: [tlug] Re: font/char set question
  - From: Darren Cook
- Re: [tlug] Re: font/char set question
  - From: Josh Glover

Prev by Date: Re: font/char set question: Chinese Encodings: GB 18030? . . . . . . . [tlug]
Next by Date: Re: [tlug] [OT] Good IT Resume
Previous by thread: Re: font/char set question: keitai: non-support of stuff is a feature . . . . . . . . [tlug]
Next by thread: Re: [tlug] Re: font/char set question
Index(es):
- Date
- Thread

Home | Main Index | Thread Index