Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] OT-Japanese in PHP
- Date: Tue, 24 May 2005 14:48:07 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] OT-Japanese in PHP
- References: <200505220201.j4M21ZnW002503@example.com><EX-MAIL-SHI-01VDBcs00000108@example.com><87wtpqsd0b.fsf@example.com><EX-MAIL-SHI-01OXNHU00000115@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.5 (cilantro, linux)
>>>>> "Yoshihiro" == Yoshihiro Sato <y_satou@example.com> writes: Yoshihiro> We need to clarify end user's environment for designing Yoshihiro> of Japanese hadling. I considered that that is web Yoshihiro> browser, and not specified its OS / versions (because Yoshihiro> this is PHP's thread.) That's the reason why I Yoshihiro> recommended to reject characters which are not in Yoshihiro> JISX0208. If the end user has a browser that can enter the character, she probably has a browser that can display it. Anyway, few servers hesitate to enforce browser upgrades in order to create funkier displays. "Best viewed with next year's Internet Exploder; CANNOT be viewed with last year's anything!" pages are all over the place, yet they can't handle users' names? Yoshihiro> And, accepting JISX0213 characters will be a problem on Yoshihiro> backend, if backend is not designed specificallly to Yoshihiro> handle JISX0213. Sure. So fix the backend. It may take time, but it's (usually) a much easier problem conceptually than dealing with the end user UI because the server owner usually owns the backend, too. Yoshihiro> Here is simple example: JISX0213 is including circled Yoshihiro> number #1 to #50 (and unicode does not defined circled Yoshihiro> number #21 to #50 characters, as far as I know.) This is a perfect example. Nobody needs those characters. Sure, if they're available, they'll be used, just like the Zapf dingbats. But the most important effect of standardizing those characters is to ensure that Japanese standards will not be unified into Unicode for years. Yoshihiro> You can find summary of characters, which are defined Yoshihiro> in JISX0213 but not in Unicode: http://www.m17n.org/m17n2000_all_but_registration/proceedings/kawabata/non-unicode.gif Wow. The shogi koma are useful in daily life for many Japanese and in line with the principles of Unicode (although arguably there should be more than a dozen of them, to represent all the pieces, like the chess series U+2654--U+265F). All the rest ... what's the rush? It would be better to standardize a block of characters that could be loaded into private space in Unicode, and accessed relative to that space. Yoshihiro> I agree that if the target of the system is M18N and Yoshihiro> not L10N, unicode is the best solution. But Unicode is no worse for L10N. Why support both Unicode and a national standard? I don't know about PHP, but the other P-languages commonly used to implement web applications (Perl, Python, and Puby--- the last P is Greek) all have reasonable suites of codecs and well-defined ways to create new ones. So storing internally in Unicode and (trivially) converting on the fly as necessary just is no big deal. Yoshihiro> I considered that this thread was/is PHP. and Yoshihiro> considered user clients are various OS/versions web Yoshihiro> browser - I don't think PHP connect to XEmacs in batch Yoshihiro> mode to handle entered strings with loading correct Yoshihiro> mapping table per each request. Of course the server doesn't connect to Emacs; Emacs LISP is a terrible language to write servers in (although people do it, and one of the more popular window managers, Sawfish, is written in a LISP that inherits a lot from Emacs LISP). The point is that this kind of programming is ultimately table-driven, anyway. If we all use Unicode, we can (a) share the table drivers, and (b) share the tables. This will take time for retrofitting old applications, of course. My point is that the implementation in XEmacs only took about a man-day (a very very smart man-day, I'll admit). Yoshihiro> Still Unicode does not cover bunch of characters, which Yoshihiro> are used in people's name, location name, etc. Here is Yoshihiro> the example: Yoshihiro> http://homepage2.nifty.com/Gat_Tin/kanji/itaiji.htm Yoshihiro> That's the reason why there's project / activities to Yoshihiro> support more characters. For example, Mojikyo Yoshihiro> http://www.mojikyo.org/ Mojikyo is a fun hobby, but it has little to do with fixing these problems. Nobody outside of the Mojikyo club is ever going to use 99% of those characters. Yoshihiro> Agree :) But we still do not have standard way to Yoshihiro> handle Japanese characters (or say, characters which Yoshihiro> are used in Japanese) - especially if characters are Yoshihiro> not in JISX0208. Yep, and the resistence to Unicode and the success-avoidance activities at JIS that result in nonsense like the non-unicode.gif table are why. It's moji-hara, sorta like seku-hara ;-), if you ask me. But "itaiji" and "gaiji" are really a different issue, don't you think? It's akin to the Western notion of a "signature", which you could think of as creating a personal font for one's name. I agree that it's very important to deal with them in Japan, and probably throughout the Han-using cultures. But it should be solved in a way that represents the human individuality of names, not by saying that "my ichi is a different character from your ichi". Yoshihiro> Here is interesting examples, how Kashiwa city Yoshihiro> governments are/were handling people's name in census Yoshihiro> registration: Yoshihiro> http://www.horagai.com/www/moji/int/kasiwa.htm and how Yoshihiro> "Japan Basic Resident Register Network" is handle Yoshihiro> characters: http://www.horagai.com/www/moji/juki.htm Thank you for the references; I will look at them closely. The question is, why doesn't JIS put its effort into standardizing this kind of thing, which is essentially an attempt to create a standard solution to the "itaiji/gaiji problem", instead of deliberately perpetuating divergent character set standards that are at best a tiny improvement over Unicode? In practice, the gaiji problem is never going to go away. The non-unicode.gif table is full of recently invented scientific notation. There will be more. We need a way to represent those characters _as they are invented_, far more than we need "maru-50", or even "Takashimaya-no-taka". Yoshihiro> It's depend on target of the system. If the service is Yoshihiro> provided to end user via web/http, and basically not Yoshihiro> restricted OS and/or environments, the safest way at Yoshihiro> this point (I don't mean in future) is, to avoid to be Yoshihiro> inserted Japanese characters which are not in JISX0208. I don't understand this. The worst that can happen is a couple of geta marks on the display. The data on the server won't be corrupted. And users will quickly learn that the geta marks mean that their client is broken, and complain, and get them fixed. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
- Follow-Ups:
- Re: [tlug] OT-Japanese in PHP
- From: Mark Sargent
- Re: [tlug] OT-Japanese in PHP
- From: Yoshihiro Sato
- References:
- Re: [tlug-digest] Re: [tlug] OT-Japanese in PHP
- From: Jim Breen
- Re: [tlug] OT-Japanese in PHP
- From: Yoshihiro Sato
- Re: [tlug] OT-Japanese in PHP
- From: Stephen J. Turnbull
- Re: [tlug] OT-Japanese in PHP
- From: Yoshihiro Sato
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] SuSE 9.1 - 9.3 Upgrade Saga
- Next by Date: Re: [tlug] SuSE 9.1 - 9.3 Upgrade Saga
- Previous by thread: Re: [tlug] OT-Japanese in PHP
- Next by thread: Re: [tlug] OT-Japanese in PHP
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links