
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Re: font/char set question
On 29/07/07, Jim Breen <jimbreen@example.com> wrote:
> That's more or less the story. Amazon.jp's pages, and presumably
> their search system, are all in Shift_JIS, i.e. the JIS X 208
> character set. Most of those hanzi above are not in that set.
Our[1] pages display in Shit_JIS; sorry 'bout that. It is still the
only encoding *guaranteed* to display on every Japanese web browser,
period.
> They (Amazon) don't use UTF-8 at all, AFAICT.
You cannot tell further than the presentation layer, ブリーン先生。In fact,
Amazon uses nothing but UTF-8 internally. Japanese pages get
Shit_JIS'd by Gurupa[2], British and US pages get ASCII'd, European
pages get Latin-1'd, and Chinese pages get... er, encoded. (Josh knows
not of the Joyo Amazon stuff.)
> If you try to paste a UTF-8 string into a WWW form set for
> Shift_JIS, a conversion will be done (what software actually
> does it depends on OS, etc.). If matches can't be made, some
> substitution, e.g. blanks, will be done.
I'm pretty sure that our search system honours the encoding you input.
Tragically, the output will be in Shit_JIS, so you won't be able to
read it. But I *know* that if you enter UTF-8, it handles it
correctly, because I do that all the time. I have to experiment with
characters outside of Shit_JIS, but I'm pretty sure I've input
Bulgarian / Russian in UTF-8 on Amazon.jp and gotten sane search
results.
I'll poke at the code tomorrow and see if I can find out for sure.
Cheers,
Josh
[1] I also work for Amazon, and I *do* work on the website platform,
though most of my work is mobile-centric
[2] http://en.wikipedia.org/wiki/Gurupa
Home |
Main Index |
Thread Index