Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] OT-Japanese in PHP



>>>>> "Yoshihiro" == Yoshihiro Sato <y_satou@example.com> writes:

    >> If the end user has a browser that can enter the character, she
    >> probably has a browser that can display it.

    Yoshihiro> I think we need to add condition: "with specfic user
    Yoshihiro> interface"

I don't understand this.  Except for the blind and a few people whose
preferred browser is libcurl, everyone is going to have a GUI browser.
But audio browsing and plaintext browsing are hard problems anyway.

    >> Anyway, few servers hesitate to enforce browser upgrades in
    >> order to create funkier displays.  "Best viewed with next
    >> year's Internet Exploder; CANNOT be viewed with last year's
    >> anything!" pages are all over the place, yet they can't handle
    >> users' names?

    Yoshihiro> If we can limit end user, yes, we can ask user to
    Yoshihiro> upgrade / change their software.

My point is that plenty of organizations do so regardless of what the
users want.  If (1) we push Unicode now, in a couple of years there
will not be serious compatibility problems, and (2) the input
verification subsystems we'll need in the interim are just good design
to start with, right?

    Yoshihiro> But it seems that you're considering that the service
    Yoshihiro> can be restricted to be ran on sprcific environment
    Yoshihiro> (i.e. specify OS, specify UI, etc.)

I don't think that's the difference.  You are saying that we should
_always_ corrupt user data in a known way.  I'm saying it's worth
trying to preserve the user data, and risk unknown corruption in many
cases.  And for important data, you want to verify anyway.  Like this:

    Yoshihiro> We still have problem in the process to trancode to
    Yoshihiro> Unicode. For example:

    Yoshihiro> * If received data 0x8740 - is it CIRCLED DIGIT ONE
    Yoshihiro> (U+2460) (=Windows-31J) or PARENTHESIZED IDEOGRAPH SUN
    Yoshihiro> (U+3230) (=Mac) ? Which character was inputted on
    Yoshihiro> user's side ?

But the problem is even worse for Europens!  In a single octet code,
is 0xA4 U+00A4 CURRENCY SIGN, or is it U+20AC EURO SIGN?  Or maybe
it's CYRILLIC CAPITAL LETTER UKRANIAN IE?

It's simple: just ask the user.  Specifically, keep a table of
questionable stuff.  In the case that you hit something questionable,
and care about it, do

     Warning

     The character you typed is ambiguous.  Did you mean
     <img src="U2460.png">, <img src="U3230.png">, or something else?

This only works on graphics-capable terminals, of course.  You can
imagine even more slick UIs, eg an imagemap which is actually a screen
dump of the output that the server's GUI would produce, with the
questionable characters highlighted and all characters clickable for
editing.  You'd only do this for data that a human being would be
unable to confidently correct and is critical, like proper names and
so on.

If you want to be fancy, you keep track of all the user information
you can get your hands on from the browser, host, etc, and learn from
that to improve guesses in the future.

(By the way, I did misunderstand what you meant by "JIS X 0208 only";
I thought you meant "not JIS X 0212 etc", but I guess you meant "not
'corporate standard' extensions to national standards"?)

    Yoshihiro> Typically this kind of approach is taken:

    Yoshihiro> Respond to user with displaying geta-mark, with
    Yoshihiro> annotation:

Again, if we know the geta mark is going to show up, we should help
them out by telling them what we know how to handle.  I was thinking
of the case where you input a character and it get converted to a
Unicode code point that my browser doesn't know.


-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links