Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: Re: Japanese input
- To: tlug@example.com
- Subject: Re: tlug: Re: Japanese input
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Thu, 11 Jun 1998 12:48:00 +0900 (JST)
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <199806102220.HAA32470@example.com>
- References: <199806101220.MAA00828@example.com><199806102220.HAA32470@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
Three asides: One: Gaspar says "this thread is about input methods". Not for me; it's about "text processing." Yes, if it were about input methods, what you're talking about is doable, and I'd join you. But I don't think it can be about input methods only. That's why I'm working at the breadboard prototype level in XEmacs. But given Gaspar's premise, I agree with nearly all of what he's written in this thread. Two: I just finished skimming the X/Open Technical Study "Universal Multiple-Octet Coded Character Set Coexistence and Migration". The sentence "Microsoft can make some of these design decisions because application portability is not a high priority in NT." My minimal knowledge suggests that Yudit is better than NT in this respect (eg, UTF-8 is used externally), but not sufficiently so. Only Gaspar is potentially competent to judge, though, at this point. And this "Technical Study" is really just < 50-page pamphlet of semi-random thoughts. Hope you have got debuggers loaded, hope you are quite prepared to crash. Ambiguous protocol codings Not e'en RSA could invert the hash. Don't go code tonight, you'll never get it right. There's a bad bug in the spec. (Copyright 1998 Yaseppochi-gumi. Gomen, ne, John Fogerty. What'm I saying, gomen, ne, Steve, your grey hairs are showing!) Three: >>>>> "Cliff" == Cliff Miller <cliff@example.com> writes: Cliff> For all the complexity, there are real advantages in Cliff> kanji. Japan has the lowest illiteracy rate in the Cliff> world. Of course, the educational system has a lot to do Cliff> with it, but not everything. The statistical system and culture has a lot to do with it, too. Japanese official unemployment rates are estimated by most people who study the issue to be about 1/3 of what they would be if counted by the US Bureau of Labor Statistics (BLS) standard. Culturally, most U.S. economists consider the BLS standard to be acceptable only because the amount of underemployment in the U.S. is small. That is due to the extreme flexibility of the U.S. employment system. Underemployment in Japan is rampant (one Japanese iconoclast I know says that you can measure underemployment in Japan by counting the number of males in pachinko parlors at 1pm on a weekday, but "it would be an understatement 'cause some people play the horses or mahjongg instead"). The World Health Organization has never been allowed to test a random sample of Japanese for literacy. In fact, nobody but Monbusho has ever been allowed to do so. This is like allowing the fox to keep chicken mortality statistics. This is not to say that Japan's literacy rate isn't the lowest in the world. But I don't know, and neither can anyone else; least of all Monbusho. Sigh. >>>>> "Matt" == Matthew J Francis <asbel@example.com> writes: >> Brother, you are in for some unpleasant surprises :-) Check out >> locale (5), o-negai-shimasu. No silver bullet here. Matt> Hmm, but that seems to relate mostly to the 'traditional' Yes. Portability and interoperability must be carefully considered. Matt> charset support. Yudit is Unicode all the way through, so Matt> character mapping and input can be uniquely specified Matt> without "knowing" anything about locales. If you meant Matt> something else I'm not seeing, please to enlighten... You can use Unicode/UCS-[24] internally if you want. This simplifies a lot of things. However, a monolingual Chinese will find a Japanese input method useless. So input cannot be `uniquely specified without "knowing" anything about locales.' It is arguable (I don't agree, but many pros do) that _every_ multilingual text should specify locale internally. Ie, a text document stored in UTF-8 does not contain Japanese, Chinese, German, and Russian, it contains a UTF-8 string. Such experts will find use of Yudit widgets unacceptable in principle. Also, some such information _absolutely must_ be included, for bidirectional languages (Semitic, mostly, but also vertical Japanese, most probably). This is emphatically not a "locale", but the handling will have some similar elements. Gaspar, doesn't Yudit (like everything else) punt on this? Matt> [Input servers] >> Nope. Symptomatic of the fundamental fact that "tastes >> differ." In any case, that was an example to demonstrate >> feasibility. I think it would be insane to try to overload a >> Japanese server with algorithms for Devanagari or Arabic. >> Multiple servers. Matt> Silly and unnecessary to want to do it all at once; So, Matt> throw in dynamic loading of conversion sets, and I don't see Matt> how it could be slower or more hungry than the existing Matt> one-locale servers. And vice versa. We have a tool for doing what you're talking about already, although it's much more general: inetd. (Woof! Betcha never thought of that!) It might be feasible and even efficient to make a single server with multiple conversion algorithms, but no need to do so. Also, on a single-user workstation running the conversion server locally, the vast majority of users will run only one. In a multiple workstation environment, it is sensible to have a server host running all the conversion servers over the network. Only a small number of multilingual experts will need to run more than one conversion server, and for them, getting the best proprietary servers is probably far more important than avoiding purchase of 4MB of RAM per server. jon@example.com, what do you say? Matt> Colour me unconvinced, but for fairness I will go and have a Matt> *really* good look at all the code (probably this weekend) Matt> before putting my head more firmly on the chopping block. _All_ the code is impossible. Wnn6 is proprietary, ATOK is proprietary, .... Even limiting yourself to the open source, my hat's off to you as a speed reader. Matt> Code does of course have an immense memetic (informational) Matt> reuse value as well as genetic (implementational). "Code Matt> reuse" can be effected without necessarily actually using Matt> code. Well, OK, I can see that. Study the code as an example of what not to do. :-) Matt> I know the limits of my knowledge - although sometimes to Matt> start coding is a very good way to find that out. I am Matt> actively researching, although I can't afford to buy much Matt> treeware at the moment; pointers to any relevant online Matt> documentation would of course be greatly appreciated... ISO and Unicode Consortium standards are expensive and not published online, unfortunately. I've never thought about looking for the JIS versions; I bet they're expensive too (besides being in Japanese). You're welcome to come to Tsukuba and study my copies any time, but I have gotten very stiff-necked about copyright since understanding the GPL. I've thought about trying to find a way to serve the document to one user at a time, but the terminal would need to be under my control.... Matt> Yudit already *has* this. Even Gaspar's code there as it is Matt> now has both raw XLib, Qt, and Motif versions of the (entry Matt> and edit) widgets; because it's quite cleanly written, it Matt> should stand porting to other toolkits with little fuss. You're missing the point. Entry and edit widgets? Great. How about buttons, labels, displays, panners, menus, titlebars, dialogs, ...? And many important applications (vi, emacs to name two) don't use widgets (at that level, anyway) at all. Matt> And who's to say if I find this fun, others can't? =^^= Porting the dialog widgets to use Gaspar's entry and edit widgets is straightforward but probably tedious. Winkling out _all_ the places where text is manipulated is why Cobol programmers are making $500,000/year to do Year 2000 maintenance. (s/text/dates/ of course.) Why aren't you learning Cobol? :-) When porting means implementation it's fun, when it means maintenance it's drudgery. Matt> Not a silver bullet, but at least a loaded gun to hand to Matt> the developers. 90% or more of text display in typical Matt> programs is done with standard widgets; Entry, Edit, Menu, Absolutely true. Matt> Label. Replace them with ones that understand Matt> internationalised input and display properly, and you're 90% Matt> of the way there. Oh, brother. Your arithmetic is right, but your model is wrong. Fred Brooks. _The Mythical Man-Month_. Get the recent updated edition, it has the famous "no silver bullet" essay in it. Read it. Then we can talk. :-) Ed Yourdon's (_Decline and Fall of the American Programmer_ et seq) stuff bears peripherally on this. Quality control, quality control, quality control. -------------------------------------------------------------- Next TLUG Meeting: 13 June Sat, Tokyo Station Yaesu gate 12:30 Featuring Stone and Turnbull on .rpm and .deb packages Next Nomikai: 17 July, 19:30 Tengu TokyoEkiMae 03-3275-3691 After June 13, the next meeting is 8 August at Tokyo Station -------------------------------------------------------------- Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp
- Follow-Ups:
- Re: tlug: Re: Japanese input
- From: "Matthew J. Francis" <asbel@example.com>
- References:
- tlug: Re: Japanese input
- From: Karl-Max Wagner <karlmax@example.com>
- Re: tlug: Re: Japanese input
- From: Cliff Miller <cliff@example.com>
Home | Main Index | Thread Index
- Prev by Date: tlug: Be_careful!!! HPLaserJet4000N_Final_Solution
- Next by Date: tlug: Minor Correction.
- Prev by thread: Re: tlug: Re: Japanese input
- Next by thread: Re: tlug: Re: Japanese input
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links