Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] OT-Japanese in PHP
- Date: Wed, 25 May 2005 17:39:01 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] OT-Japanese in PHP
- References: <87u0ktqinc.fsf@example.com><EX-MAIL-SHI-01A0VYq0000014e@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.5 (cilantro, linux)
>>>>> "Yoshihiro" == Yoshihiro Sato <y_satou@example.com> writes: >> If the end user has a browser that can enter the character, she >> probably has a browser that can display it. Yoshihiro> I think we need to add condition: "with specfic user Yoshihiro> interface" I don't understand this. Except for the blind and a few people whose preferred browser is libcurl, everyone is going to have a GUI browser. But audio browsing and plaintext browsing are hard problems anyway. >> Anyway, few servers hesitate to enforce browser upgrades in >> order to create funkier displays. "Best viewed with next >> year's Internet Exploder; CANNOT be viewed with last year's >> anything!" pages are all over the place, yet they can't handle >> users' names? Yoshihiro> If we can limit end user, yes, we can ask user to Yoshihiro> upgrade / change their software. My point is that plenty of organizations do so regardless of what the users want. If (1) we push Unicode now, in a couple of years there will not be serious compatibility problems, and (2) the input verification subsystems we'll need in the interim are just good design to start with, right? Yoshihiro> But it seems that you're considering that the service Yoshihiro> can be restricted to be ran on sprcific environment Yoshihiro> (i.e. specify OS, specify UI, etc.) I don't think that's the difference. You are saying that we should _always_ corrupt user data in a known way. I'm saying it's worth trying to preserve the user data, and risk unknown corruption in many cases. And for important data, you want to verify anyway. Like this: Yoshihiro> We still have problem in the process to trancode to Yoshihiro> Unicode. For example: Yoshihiro> * If received data 0x8740 - is it CIRCLED DIGIT ONE Yoshihiro> (U+2460) (=Windows-31J) or PARENTHESIZED IDEOGRAPH SUN Yoshihiro> (U+3230) (=Mac) ? Which character was inputted on Yoshihiro> user's side ? But the problem is even worse for Europens! In a single octet code, is 0xA4 U+00A4 CURRENCY SIGN, or is it U+20AC EURO SIGN? Or maybe it's CYRILLIC CAPITAL LETTER UKRANIAN IE? It's simple: just ask the user. Specifically, keep a table of questionable stuff. In the case that you hit something questionable, and care about it, do Warning The character you typed is ambiguous. Did you mean <img src="U2460.png">, <img src="U3230.png">, or something else? This only works on graphics-capable terminals, of course. You can imagine even more slick UIs, eg an imagemap which is actually a screen dump of the output that the server's GUI would produce, with the questionable characters highlighted and all characters clickable for editing. You'd only do this for data that a human being would be unable to confidently correct and is critical, like proper names and so on. If you want to be fancy, you keep track of all the user information you can get your hands on from the browser, host, etc, and learn from that to improve guesses in the future. (By the way, I did misunderstand what you meant by "JIS X 0208 only"; I thought you meant "not JIS X 0212 etc", but I guess you meant "not 'corporate standard' extensions to national standards"?) Yoshihiro> Typically this kind of approach is taken: Yoshihiro> Respond to user with displaying geta-mark, with Yoshihiro> annotation: Again, if we know the geta mark is going to show up, we should help them out by telling them what we know how to handle. I was thinking of the case where you input a character and it get converted to a Unicode code point that my browser doesn't know. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
- References:
- Re: [tlug] OT-Japanese in PHP
- From: Stephen J. Turnbull
- Re: [tlug] OT-Japanese in PHP
- From: Yoshihiro Sato
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] OT-Japanese in PHP
- Next by Date: Re: [tlug] OT-Japanese in PHP
- Previous by thread: Re: [tlug] OT-Japanese in PHP
- Next by thread: [tlug-digest] Re: [tlug] OT-Japanese in PHP
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links