Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] OT-Japanese in PHP



>>>>> "Yoshihiro" == Yoshihiro Sato <y_satou@example.com> writes:

    Yoshihiro> On server's side, especially if it's web application, I
    Yoshihiro> recommend to handle data like this: reject all
    Yoshihiro> characters which are not in JISX0208, and reject all
    Yoshihiro> half-width katakana.

That's not acceptable if you're a client-oriented operation.  And it's
quite unnecessary.  There are no disagreements about full-width <->
half-width mappings, and there's no good reason to reject anything in
JIS X 0212 (or JIS X 0213).

The really shameful thing is that my wife, who is a native Japanese,
regularly has to request that companies and organizations which should
know better (eg, Nomura Shoken and the Japan Association of
Financial Planners)  hand-enter her information because the web page
won't accept a katakana surname (fullwidth or halfwidth)!

    Yoshihiro> 3. Unicode CJK characters are unified.

This is not a problem unless you're doing multilingual work (multiple
languages in the same document).  Mere I18N/L10N is complex, of
course, but Han unification is not the problem there.

    Yoshihiro> This issue is typically happened when entering people's
    Yoshihiro> name and/or location name.

But this isn't a problem of Unicode, which fully handles the entire
JIS X 0208 and JIS X 0212 character sets.  The problem is that the
Japanese standards bodies have spent at least 100 years prescribing
rather than describing the language, and so a welter of non-conforming
industry standards have grown up.

    Yoshihiro> Even if end user has method to input correct character
    Yoshihiro> on their UI in legacy character set, but there's a case
    Yoshihiro> it's mapped to different character on server's side.

So fix the server!  It's not like correcting the mapping tables is
hard.  Eg, in XEmacs 21.5 you just do a wget of the Unicode Consortium
or other registry's tables into etc/unicode, and type M-x
load-unicode-tables RET.  XEmacs has _other_ _serious_ problems in
Unicode handling, but the mapping tables have been available since
2002 or so, and they only took that long because there wasn't really a
use for them before updating the Windows port to use Windows NT
Unicode APIs.

    Yoshihiro> But actual problem is, most of the case end user does
    Yoshihiro> not have proper way to input such special characters.

I simply don't believe that, except maybe for keitai platforms.  Both
Windows and the Mac provide palette-based input methods, and such are
available for any free software OS.  Sure, you have to find the
character the first time, but after that you record it in your
dictionary.  This is a user education problem, not a Unicode issue.

    Yoshihiro> And users input "simplified character" or "similar
    Yoshihiro> character" as compromised solution when they meet
    Yoshihiro> restriction.

Shameful.  The first thing that should be done with technology is to
allow people to write their own names and addresses correctly!

I'll grant that in practice, fixing an existing installation can be
difficult, because you may have to rebuild from the ground up with new
server software, add-on modules, and the like.  But new installations
should take advantage of Unicode technology which allows a unified
treatment of all these problems, and software (including font)
sharing.  And this should be a criterion (not necessarily overriding,
of course) for any upgrade.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links