Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] OT-Japanese in PHP



>>>>> "David" == David E <dave@?om> writes:

    >> The generally accepted idea is that since Shift_JIS was created
    >> by Japanese people for Japanese people, then it handles the
    >> Japanese language better than UTF-8, which is not true (^_^)

No, and in fact up until the most recent revision of the JIS standard,
there were a few people in Hokkaido who could only type their
addresses in UTF-8.

    David> I've heard the "UTF-8" messes up some characters objection
    David> from Japanese developers several times, though I've been
    David> able to get an actual example of it. Urban myth, perhaps?

Yes and no.  Of course technically for all national standard
characters Unicode must round-trip; in that sense it is a myth.

However, Unicode cannot include characters that are not standardized
by certain recognized bodies (don't ask me; all I know is that the
above-mentioned northerners "lucked out" because the characters they
need are originally Han, not Yamato characters, while Ukrainian
Cyrillic users were not so lucky -- until they got a country, they
couldn't property view their traditional literature in Unicode).

A lot of such characters are present in many Shift JIS encoded fonts,
especially on platforms like Fujitsu, NEC, and IBM.  Of course,
they're in JIS private space, so why you couldn't do the same with
UTF-8, I don't know.

The other problem is that UTF-8 doesn't give a clue about which
language is being used, while lots of naive users and even some
programmers don't realize that charset and language are distinct
concepts.  This means that font selection does require some smarts
(but not all that much; kana are unique to Japanese and ubiquitous in
that language, ditto Hangul for Korean, and the simplified and
traditional forms of Chinese hanzi have been deemed different, so
Taiwanese and Mandarin can be distinguished quite reliably, too).  So,
for example, if you bring up XEmacs 21.5 in a POSIX environment, it
prefers Chinese fonts (even for kana!)

My personal feeling is that bloody-minded nationalism is responsible
for much of this; everybody has ASCII envy (the Chinese went so far as
to create a Unicode derivative in which GB2312 plays the same role as
ISO 8859/1 does for standard Unicode, ie, a subset with the same code
points as in the non-Unicode standard).  And some influential Japanese
have this crazy idea that some 0s and 1s have more "Yamato damashii"
than other 0s and 1s.

Of course, looking at my own university's home pages, maybe it's not
nationalism.  Maybe it's just plain economic protectionism.  Any
third-rate San Francisco designer in combination with a gaggle of
programmers from Bangalore could do a much more attractive job at half
the price, but their systems would choke on Shift JIS.

    David> Anyway, the reason I suggested setting the output encoding
    David> in php.ini to SJIS, for a begginner is that it's likely to
    David> be the easiest for him to get started.

This is reasonable, but it should be accompanied with a FIXME comment.
:-)  And once they've gotten a little past that point, they should be
negotiating language, charset, and the like.

Admittedly, I don't do any of this on my own home pages.  I do use
META elements and ISO-2022-JP rather than Shift JIS.  And I don't
pretend to be a professional....

    David> Then there's also the fact that if you're working with a
    David> web designer, trying to get them to do their HTML in
    David> anything but Shift_JIS is almost always waaay more trouble
    David> than making your scripts deal with SJIS output.

Heh.  iconv is your friend.  "Promise her anything, but give her UTF-8."

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links