Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: Japanese in URLs?



On 07/02/2008, Stephen J. Turnbull <stephen@example.com> wrote:
> Nguyen Vu Hung writes:
>  > 2008/2/6, Jim Breen <jimbreen@example.com>:

> > > and I don't want the browser to play it back to me as an
> > > expletive in Klingon because it decided it was somethig in UTF-8.
> > > It's different, of course, if the field has an ACE prefix such as
> > > "xn--".

> > RFC2718[1] says the URL *should* be encoded after the character sequences
> > is transtalted to UTF-8.

> No, it doesn't.

Thanks, Stephen. You saved me writing something very similar.

> > As far as I know, browsers which display anything but the hex-encoded
> > path are strictly speaking in violation of RFC 3987:

They can get very confusing if they attempt anything else. Consider the
following URI:

http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1W%B6%D0%A4%E1%A4%EB_v1

It is asking for the verb inflection table for 勤める, and since it is generated
by a link within WWWJDIC, it is using WWWJDIC's internal coding (EUC-JP). I
actually use WWWJDIC in UTF-8 (a cookie setting), so I get that table displayed
in UTF-8. If Firefox attempted to display the URI by treating the
%B6%D0%A4%E1%A4%EB as UTF-8, it would simply get garbage.

>  > What Firefox doing is not wrong but personally, I think the browser
>  > should be able to display actual Japanese for better readability.

It would indeed be nice to get Japanese, etc, in URIs or IRIs displaying
correctly, but there is no way a browser can be sure of the coding used.
You could imagine a browser perhaps having an option for suggesting a
URL (de)coding, but in fact the coding of strings such as 勤める above
is usually entirely a matter for the server developer.

Maybe in some rosy future when the whole universe uses Unicode for
everything, and the specs for URIs and IRIs allow for raw UTF8, we
might see browser specs being relaxed, but for now, I think Firefox
is doing the Right Thing.

Cheers

Jim

PS, I tried the above URL in Opera (9.25). It didn't attempt to decode
the %xx%xx string.

-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links