Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Re: Japanese input




Hi,

On 11-Jun-98 Stephen J. Turnbull wrote:

[...]

>  You can use Unicode/UCS-[24] internally if you want.  This simplifies
>  a lot of things.  However, a monolingual Chinese will find a Japanese
>  input method useless.  So input cannot be `uniquely specified without
>  "knowing" anything about locales.'

Natch, that's not quite what I meant. What I meant was, that unlike with
the multitude of old-style national-language charsets, with Unicode you
can say "Give me Chinese input" and have input turned into Chinese-
in-Unicode, then turn round a few seconds later and say "Give me Hebrew
input!", and have it turned into Hebrew-in-Unicode.
  To do that, it is not necessary to know about the other ways "locale"
can be specified, or worry about the 101 quirks of the different national
charsets in question, only that a specified transform is applied with
keystrokes in, and Unicode out. A mechanism that can do that should be
extensible to any language.


>  It is arguable (I don't agree, but many pros do) that _every_
>  multilingual text should specify locale internally.  Ie, a text
>  document stored in UTF-8 does not contain Japanese, Chinese, German,
>  and Russian, it contains a UTF-8 string.  Such experts will find use
>  of Yudit widgets unacceptable in principle.

I am also arguing for UTF-8(/UCS4) internally. I'm certain Yudit does
that already.

I'm not arguing that the Yudit widgets could be used verbatim, though,
for that's patently untrue. They do, however, represent a good specimen
implementation of how things might be better than they are now.


>  Also, some such information _absolutely must_ be included, for
>  bidirectional languages (Semitic, mostly, but also vertical Japanese,
>  most probably).  This is emphatically not a "locale", but the handling
>  will have some similar elements.

Agreed that makes for a very tricky problem. The closest solution apparent
to me is twofold:

  1. A somewhat "smart" text renderer, with a set of defaults for how
     different codepages are displayed (e.g. "English l->r, Hebrew r->l,
     Japanese r->l and vertical with breaking every 20 lines"). If that
     is really insufficient control, then...

  2. A "richer" text widget, with facility for control information about
     how to render different languages and texts in arbitrary ways.

(1) would correspong to the existing sort of "text edit control"; if you
need to do something more than the simplistic markup it would provide,
then you have need of something more like a Word Processor.


Are there standards related to this sort of thing already? (and if not why
not?) I definitely don't want to end up with just another set of
incompatible extensions for embedded text markup...


[...]

>  _All_ the code is impossible.  Wnn6 is proprietary, ATOK is
>  proprietary, ....  Even limiting yourself to the open source, my hat's
>  off to you as a speed reader.

Oh, true, true of course. But I think there's enough open source out
there to keep me reading for a while, as you say. If there isn't, then
that in itself would suggest the need for new code...



[Expensive standards]

>  You're welcome to come to Tsukuba and study my copies any time, but I
>  have gotten very stiff-necked about copyright since understanding the
>  GPL.  I've thought about trying to find a way to serve the document to
>  one user at a time, but the terminal would need to be under my
>  control....

The day I can afford to commute to Japan, I'll certainly be able to
afford a few poxy standards books... in the mean time, it's the end of
the academic year, and I'm flat, stony, broke -_-;
After summer, when I've earnt a little money...


>      Matt> Yudit already *has* this. Even Gaspar's code there as it is
>      Matt> now has both raw XLib, Qt, and Motif versions of the (entry
>      Matt> and edit) widgets; because it's quite cleanly written, it
>      Matt> should stand porting to other toolkits with little fuss.
>  
>  You're missing the point.  Entry and edit widgets?  Great.  How about
>  buttons, labels, displays, panners, menus, titlebars, dialogs, ...?

Labels are easy, comparitively speaking (no input). Buttons are
containers with a label, menus and dialogs are containers of labels.
Almost everything other widget worth consideration is basically a
container of either Labels or Entries.
What exactly is in the labels is a slightly different problem, but
gettext, for example, shows you can at least make a framework to help.

Titlebars are admittedly a WM problem.


>  And many important applications (vi, emacs to name two) don't use
>  widgets (at that level, anyway) at all.

Indeed true. As for the examples, though: Vi would be a tough nut to
crack in any case unless you had a Unicode or other global terminal to
run it in; and Emacs has always been a law unto itself.
  This has the of course the reverse effect that even as "general" methods
are generally unportable to Emacs, Emacs' own solutions are unportable
from it. For better or worse, almost noone else uses elisp...



>  Why aren't you learning Cobol?  :-)  When porting means implementation
>  it's fun, when it means maintenance it's drudgery.

"Mmm... COBOL". If I wanted to earn stupendous amount of money for just
the next two years, that's probably the business I'd be in. But I'm stuck
here in University, allegedly trying to make myself more employable...



[Widgets]

>      Matt> Replace them with ones that understand
>      Matt> internationalised input and display properly, and you're 90%
>      Matt> of the way there.
>  
>  Oh, brother.  Your arithmetic is right, but your model is wrong.
>  Fred Brooks.  _The Mythical Man-Month_.  Get the recent updated
>  edition, it has the famous "no silver bullet" essay in it.  Read it.
>  Then we can talk.  :-)

Have it, read it, respect it. Sleep with the Twentieth Anniversary
Edition metaphorically under my pillow.
  However, although a remarkable amount of it is still highly relevant
twenty years later, not all of its content ports cleanly to an open source
environment.

To reiterate once more, I'm not arguing that the bullet is silver, but
that at the moment we are *firing blanks*. Whether or not there is one
perfect solution to all the evils of i18n, m17n, l11n is irrelevant, and
not to be used as an excuse for not trying to improve matters.


>  Ed Yourdon's (_Decline and Fall of the American Programmer_ et seq)
>  stuff bears peripherally on this.  Quality control, quality control,
>  quality control.

The name "Yourdon" is familiar, but the title is not. Will check the
library later, on the off-chance that they have anything quite so useful.
Of course, anyone who accused me of being an *American* programmer would
be shot on sight ]:)


Cheers,
-Matt.

"The results of this intrusion into your life will be used 'responsibly'
in ways you cannot even begin to imagine. Of course, the innocent have
nothing to fear from the rapidly expanding data industry."
 - Radiohead, Airbag/How Am I Driving?

--------------------------------------------------------------
Next TLUG Meeting: 13 June Sat, Tokyo Station Yaesu gate 12:30
Featuring Stone and Turnbull on .rpm and .deb packages
Next Nomikai: 17 July, 19:30 Tengu TokyoEkiMae 03-3275-3691
After June 13, the next meeting is 8 August at Tokyo Station
--------------------------------------------------------------
Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links