Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: font/char set question



On 30/07/07, Josh Glover <jmglov@example.com> wrote:
> On 29/07/07, Jim Breen <jimbreen@example.com> wrote:

> > They (Amazon) don't use UTF-8 at all, AFAICT.
>
> You cannot tell further than the presentation layer, ブリーン舌皛逅跂勉闕生。In fact,
> Amazon uses nothing but UTF-8 internally. Japanese pages get
> Shit_JIS'd by Gurupa[2], British and US pages get ASCII'd, European
> pages get Latin-1'd, and Chinese pages get... er, encoded. (Josh knows
> not of the Joyo Amazon stuff.)

Thanks for this insight into Amazon's internals.

> > If you try to paste a UTF-8 string into a WWW form set for
> > Shift_JIS, a conversion will be done (what software actually
> > does it depends on OS, etc.). If matches can't be made, some
> > substitution, e.g. blanks, will be done.
>
> I'm pretty sure that our search system honours the encoding you input.
> Tragically, the output will be in Shit_JIS, so you won't be able to
> read it.

Great pity.

> But I *know* that if you enter UTF-8, it handles it
> correctly, because I do that all the time. I have to experiment with
> characters outside of Shit_JIS, but I'm pretty sure I've input
> Bulgarian / Russian in UTF-8 on Amazon.jp and gotten sane search
> results.

However, since the WWW form is in Shift_JIS, browsers are unlikely to send
in field contents that are not in either ISO 646 (ASCII) or JIS X 0208.

> [1] I also work for Amazon, and I *do* work on the website platform,
> though most of my work is mobile-centric
> [2] http://en.wikipedia.org/wiki/Gurupa

Gurupa sounds interesting. I wish they'd change their Japanese output
overt to UTF-8.

Jim


-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links