Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Re: font/char set question



steven smith <sjs@example.com> wrote:
> I just had something interesting happen.  The string below
> came across in another list:
> 现代汉语词典
> It's the name of a Chinese Kanji dictionary.
>
> I searched for it on amazon.jp.  The search dialog looked ok
> when I pasted but the return was:
> "Your search " 代   典" did not match any products."
> and the dialog looked like :  代   典
> I don't know what this will look like on other browsers.
>
> What am I hitting here?  Is the font used in the
> Amazon.co.jp non utf-8 (iso2022-jp maybe)?

That's more or less the story. Amazon.jp's pages, and presumably
their search system, are all in Shift_JIS, i.e. the JIS X 208
character set. Most of those hanzi above are not in that set.

(BTW, font is not the issue here. It's all to dowith character
sets.)

> Do they use a
> utf-8 that doesn't support the (I'm assuming) chinese only
> kanji in this string?

They (Amazon) don't use UTF-8 at all, AFAICT.

> Is there a translation between
> character sets going on somewhere that's dropping the
> chinese characters?

If you try to paste a UTF-8 string into a WWW form set for
Shift_JIS, a conversion will be done (what software actually
does it depends on OS, etc.). If matches can't be made, some
substitution, e.g. blanks, will be done.

Jim
-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links