Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: font/char set question



On 29/07/07, Jim Breen <jimbreen@example.com> wrote:

> That's more or less the story. Amazon.jp's pages, and presumably
> their search system, are all in Shift_JIS, i.e. the JIS X 208
> character set. Most of those hanzi above are not in that set.

Our[1] pages display in Shit_JIS; sorry 'bout that. It is still the
only encoding *guaranteed* to display on every Japanese web browser,
period.

> They (Amazon) don't use UTF-8 at all, AFAICT.

You cannot tell further than the presentation layer, ブリーン先生。In fact,
Amazon uses nothing but UTF-8 internally. Japanese pages get
Shit_JIS'd by Gurupa[2], British and US pages get ASCII'd, European
pages get Latin-1'd, and Chinese pages get... er, encoded. (Josh knows
not of the Joyo Amazon stuff.)

> If you try to paste a UTF-8 string into a WWW form set for
> Shift_JIS, a conversion will be done (what software actually
> does it depends on OS, etc.). If matches can't be made, some
> substitution, e.g. blanks, will be done.

I'm pretty sure that our search system honours the encoding you input.
Tragically, the output will be in Shit_JIS, so you won't be able to
read it. But I *know* that if you enter UTF-8, it handles it
correctly, because I do that all the time. I have to experiment with
characters outside of Shit_JIS, but I'm pretty sure I've input
Bulgarian / Russian in UTF-8 on Amazon.jp and gotten sane search
results.

I'll poke at the code tomorrow and see if I can find out for sure.

Cheers,
Josh

[1] I also work for Amazon, and I *do* work on the website platform,
though most of my work is mobile-centric
[2] http://en.wikipedia.org/wiki/Gurupa

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links