
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] OT-Japanese in PHP
- Date: Wed, 25 May 2005 15:59:34 +0900
- From: Yoshihiro Sato <y_satou@example.com>
- Subject: Re: [tlug] OT-Japanese in PHP
- References: <87u0ktqinc.fsf@example.com>
- Organization: Amazon.co.jp
- User-agent: Wanderlust/2.12.0 (Your Wildest Dreams) SEMI/1.14.6 (Maruoka)FLIM/1.14.7 (Sanjō) APEL/10.6 Emacs/21.3(i386-redhat-linux-gnu) MULE/5.0 (SAKAKI)
On Tue, 24 May 2005 14:48:07 +0900, "Stephen J. Turnbull" <stephen@example.com> said:
>
Yoshihiro> We need to clarify end user's environment for designing
Yoshihiro> of Japanese hadling. I considered that that is web
Yoshihiro> browser, and not specified its OS / versions (because
Yoshihiro> this is PHP's thread.) That's the reason why I
Yoshihiro> recommended to reject characters which are not in
Yoshihiro> JISX0208.
>
> If the end user has a browser that can enter the character, she
> probably has a browser that can display it.
I think we need to add condition: "with specfic user interface"
> Anyway, few servers hesitate to enforce browser upgrades in order to
> create funkier displays. "Best viewed with next year's Internet
> Exploder; CANNOT be viewed with last year's anything!" pages are all
> over the place, yet they can't handle users' names?
If we can limit end user, yes, we can ask user to upgrade / change their
software.
I'm considering that the service's condition is providing to various users,
various environment (again, because this was/is PHP's thread.)
Just like this - someone enter the data with CP932 - the other user will
lookup the information via http with web browser on MacOS with Shift-JIS
Macintosh encoding, or with various carrier's cellphone device. Or,
distribute the infomation as plain text email with iso-2022-jp.
But it seems that you're considering that the service can be restricted to
be ran on sprcific environment (i.e. specify OS, specify UI, etc.)
I think this is the divergence of our discussion.
Yoshihiro> And, accepting JISX0213 characters will be a problem on
Yoshihiro> backend, if backend is not designed specificallly to
Yoshihiro> handle JISX0213.
>
> Sure. So fix the backend. It may take time, but it's (usually) a
> much easier problem conceptually than dealing with the end user UI
> because the server owner usually owns the backend, too.
Yeah, it can be done if we stick end user's UI.
Yoshihiro> I agree that if the target of the system is M18N and
Yoshihiro> not L10N, unicode is the best solution.
>
> But Unicode is no worse for L10N. Why support both Unicode and a
> national standard? I don't know about PHP, but the other P-languages
> commonly used to implement web applications (Perl, Python, and Puby---
> the last P is Greek) all have reasonable suites of codecs and
> well-defined ways to create new ones. So storing internally in
> Unicode and (trivially) converting on the fly as necessary just is no
> big deal.
We still have problem in the process to trancode to Unicode. For example:
* If received data 0x8740 - is it CIRCLED DIGIT ONE (U+2460) (=Windows-31J)
or PARENTHESIZED IDEOGRAPH SUN (U+3230) (=Mac) ? Which character was
inputted on user's side ?
Unforuntately, there's no way to detect it correctly if we provide this
service on the Internet - it's depend on user's OS, browser, and font.
Maybe we can check User Agent for OS and browser, but no way to detect which
font is being used by user.
> But "itaiji" and "gaiji" are really a different issue, don't you
> think? It's akin to the Western notion of a "signature", which you
> could think of as creating a personal font for one's name. I agree
> that it's very important to deal with them in Japan, and probably
> throughout the Han-using cultures. But it should be solved in a way
> that represents the human individuality of names, not by saying that
> "my ichi is a different character from your ichi".
Actually it's enough important for Japanese government work - that's the
reason why they have quiet volume of "gaiji" table, and using it in their
daily operations.
> Thank you for the references; I will look at them closely. The
> question is, why doesn't JIS put its effort into standardizing this
> kind of thing, which is essentially an attempt to create a standard
> solution to the "itaiji/gaiji problem", instead of deliberately
> perpetuating divergent character set standards that are at best a tiny
> improvement over Unicode?
>
> In practice, the gaiji problem is never going to go away. The
> non-unicode.gif table is full of recently invented scientific
> notation. There will be more. We need a way to represent those
> characters _as they are invented_, far more than we need "maru-50", or
> even "Takashimaya-no-taka".
I'm sorry but I have no idea at this point. As far as I know, I've heard that
Citizen office should accept the character if it's on dictionary.
(It's Ministry of Justice's announcement, IIRC.)
FYI, I suppose you already know of this article, but this is very intersting
- written by Katsuhiro Ogata on Impress Watch webzine,
http://internet.watch.impress.co.jp/www/column/ogata/index.htm
reported the discussion of finalizing 2000JIS and JISX0213. It seemed that
there were a not only technical discussion but also political maneuver...
Yoshihiro> It's depend on target of the system. If the service is
Yoshihiro> provided to end user via web/http, and basically not
Yoshihiro> restricted OS and/or environments, the safest way at
Yoshihiro> this point (I don't mean in future) is, to avoid to be
Yoshihiro> inserted Japanese characters which are not in JISX0208.
>
> I don't understand this. The worst that can happen is a couple of
> geta marks on the display. The data on the server won't be corrupted.
> And users will quickly learn that the geta marks mean that their
> client is broken, and complain, and get them fixed.
Typically this kind of approach is taken:
Respond to user with displaying geta-mark, with annotation: "the character(s)
which is(are) displayed with geta-mark indicates you that the character(s)
which you input cannot be handled on our system. Please use simplified
character, or Hira-gana or Kata-kana if it's Kanji. If it's Hankaku katakana,
please use Zenkaku katakana. Also some machine-dependent characters - like
circled numbers, roman numeric digits - are also rejected. Please use normal
arabic digits instead of."
--
Yoshihiro Satou
y_satou@example.com
Home |
Main Index |
Thread Index