Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] OT-Japanese in PHP
- Date: Wed, 25 May 2005 15:59:34 +0900
- From: Yoshihiro Sato <y_satou@example.com>
- Subject: Re: [tlug] OT-Japanese in PHP
- References: <87u0ktqinc.fsf@example.com>
- Organization: Amazon.co.jp
- User-agent: Wanderlust/2.12.0 (Your Wildest Dreams) SEMI/1.14.6 (Maruoka)FLIM/1.14.7 (Sanjō) APEL/10.6 Emacs/21.3(i386-redhat-linux-gnu) MULE/5.0 (SAKAKI)
On Tue, 24 May 2005 14:48:07 +0900, "Stephen J. Turnbull" <stephen@example.com> said: > Yoshihiro> We need to clarify end user's environment for designing Yoshihiro> of Japanese hadling. I considered that that is web Yoshihiro> browser, and not specified its OS / versions (because Yoshihiro> this is PHP's thread.) That's the reason why I Yoshihiro> recommended to reject characters which are not in Yoshihiro> JISX0208. > > If the end user has a browser that can enter the character, she > probably has a browser that can display it. I think we need to add condition: "with specfic user interface" > Anyway, few servers hesitate to enforce browser upgrades in order to > create funkier displays. "Best viewed with next year's Internet > Exploder; CANNOT be viewed with last year's anything!" pages are all > over the place, yet they can't handle users' names? If we can limit end user, yes, we can ask user to upgrade / change their software. I'm considering that the service's condition is providing to various users, various environment (again, because this was/is PHP's thread.) Just like this - someone enter the data with CP932 - the other user will lookup the information via http with web browser on MacOS with Shift-JIS Macintosh encoding, or with various carrier's cellphone device. Or, distribute the infomation as plain text email with iso-2022-jp. But it seems that you're considering that the service can be restricted to be ran on sprcific environment (i.e. specify OS, specify UI, etc.) I think this is the divergence of our discussion. Yoshihiro> And, accepting JISX0213 characters will be a problem on Yoshihiro> backend, if backend is not designed specificallly to Yoshihiro> handle JISX0213. > > Sure. So fix the backend. It may take time, but it's (usually) a > much easier problem conceptually than dealing with the end user UI > because the server owner usually owns the backend, too. Yeah, it can be done if we stick end user's UI. Yoshihiro> I agree that if the target of the system is M18N and Yoshihiro> not L10N, unicode is the best solution. > > But Unicode is no worse for L10N. Why support both Unicode and a > national standard? I don't know about PHP, but the other P-languages > commonly used to implement web applications (Perl, Python, and Puby--- > the last P is Greek) all have reasonable suites of codecs and > well-defined ways to create new ones. So storing internally in > Unicode and (trivially) converting on the fly as necessary just is no > big deal. We still have problem in the process to trancode to Unicode. For example: * If received data 0x8740 - is it CIRCLED DIGIT ONE (U+2460) (=Windows-31J) or PARENTHESIZED IDEOGRAPH SUN (U+3230) (=Mac) ? Which character was inputted on user's side ? Unforuntately, there's no way to detect it correctly if we provide this service on the Internet - it's depend on user's OS, browser, and font. Maybe we can check User Agent for OS and browser, but no way to detect which font is being used by user. > But "itaiji" and "gaiji" are really a different issue, don't you > think? It's akin to the Western notion of a "signature", which you > could think of as creating a personal font for one's name. I agree > that it's very important to deal with them in Japan, and probably > throughout the Han-using cultures. But it should be solved in a way > that represents the human individuality of names, not by saying that > "my ichi is a different character from your ichi". Actually it's enough important for Japanese government work - that's the reason why they have quiet volume of "gaiji" table, and using it in their daily operations. > Thank you for the references; I will look at them closely. The > question is, why doesn't JIS put its effort into standardizing this > kind of thing, which is essentially an attempt to create a standard > solution to the "itaiji/gaiji problem", instead of deliberately > perpetuating divergent character set standards that are at best a tiny > improvement over Unicode? > > In practice, the gaiji problem is never going to go away. The > non-unicode.gif table is full of recently invented scientific > notation. There will be more. We need a way to represent those > characters _as they are invented_, far more than we need "maru-50", or > even "Takashimaya-no-taka". I'm sorry but I have no idea at this point. As far as I know, I've heard that Citizen office should accept the character if it's on dictionary. (It's Ministry of Justice's announcement, IIRC.) FYI, I suppose you already know of this article, but this is very intersting - written by Katsuhiro Ogata on Impress Watch webzine, http://internet.watch.impress.co.jp/www/column/ogata/index.htm reported the discussion of finalizing 2000JIS and JISX0213. It seemed that there were a not only technical discussion but also political maneuver... Yoshihiro> It's depend on target of the system. If the service is Yoshihiro> provided to end user via web/http, and basically not Yoshihiro> restricted OS and/or environments, the safest way at Yoshihiro> this point (I don't mean in future) is, to avoid to be Yoshihiro> inserted Japanese characters which are not in JISX0208. > > I don't understand this. The worst that can happen is a couple of > geta marks on the display. The data on the server won't be corrupted. > And users will quickly learn that the geta marks mean that their > client is broken, and complain, and get them fixed. Typically this kind of approach is taken: Respond to user with displaying geta-mark, with annotation: "the character(s) which is(are) displayed with geta-mark indicates you that the character(s) which you input cannot be handled on our system. Please use simplified character, or Hira-gana or Kata-kana if it's Kanji. If it's Hankaku katakana, please use Zenkaku katakana. Also some machine-dependent characters - like circled numbers, roman numeric digits - are also rejected. Please use normal arabic digits instead of." -- Yoshihiro Satou y_satou@example.com
- Follow-Ups:
- Re: [tlug] OT-Japanese in PHP
- From: Stephen J. Turnbull
- References:
- Re: [tlug] OT-Japanese in PHP
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] OT-Japanese in PHP
- Next by Date: Re: [tlug] SuSE 9.1 - 9.3 Upgrade Saga
- Previous by thread: Re: [tlug] OT-Japanese in PHP
- Next by thread: Re: [tlug] OT-Japanese in PHP
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links