Re: [tlug] i18n Primer

Date: Wed, 11 Aug 2004 10:26:37 +1000 (EST)
From: Jim Breen <Jim.Breen@example.com>
Subject: Re: [tlug] i18n Primer

"Stephen J. Turnbull" <stephen@example.com> wrote:
>> >>>>> "Josh" == Josh Glover <tlug@example.com> writes:
>>     Josh> Will one of the i18n gurus on this list (Steve, Jim, et al.) 
>>     Josh> please recommend *the* i18n primer for software developers? 
>>     Josh> I need to remedy my glaring lack of knowledge in the area.
>> 
>> There isn't one that I know of.  Jim and I were going to write one,
>> but it hasn't happened yet.

Naruhodo. I am bearing a huge load of guilt because I have done very
little towards it.

>>     Josh> Using Google, I found "Introduction to i18n", by Kubota
>>     Josh> Tomohiro,[1] which I have printed off for reading. Is this a
>>     Josh> good intro?
>> 
>> No.  It's written by a Japanese, which normally implies a quite warped
>> point of view toward i18n.  Kubota is no exception.  Nor am I (I may
>> not have been born Japanese, but my I18N baptism was in the Church of
>> Shit-JIS Mojibake).

I've read some of Kubota's stuff. It's not but, but a bit idiosyncratic
(but then, aren't we all?)

>> However, it may be the best there is in English.  Suzuki et al have a
>> book in Japanese, but it's oriented to the specialist.  O'Reilly's X
>> Window System series had an X11R5 update volume which introduced a
>> bunch of things reasonably well, but obviously it's heavily
>> X-oriented.  And I don't know how that's been treated in the X11R6
>> editions.  My article in the LJ in 1999 was too heavy on theory, not
>> very strong on practice, as was my chapter in Wrox's Professional
>> Linux Programming.

'Twas early days though. I things are a bit clearer now in a practical
sense.

>>     Josh> I am trying to avoid writing software that is difficult to
>>     Josh> internationalise, so I am looking to become familiar with
>>     Josh> the basics of i18n.
>> 
>> There are no basics of I18N, really.  It's all advanced details.

And that's the hurdle we are trying to climb over. There is a heap of
stuff  you have to do and do right.

>> However, if you plan to leave the details to others, it's not too
>> hard.  The first principle is to convert to Unicode (if you're working
>> in a low-level language like C/C++, preferably widechars, not UTF-8,
>> so as to ensure that English doesn't work serendipitously if you use
>> the wrong API) as soon as possible, and do all internal string
>> processing in Unicode.

Um. I'm not so sure about this. I can think of many situations where
you can comfortably leave the internals in UTF8. The hit converting
UTF8->Unicode->UTF8 while working on a large file can be horrible. For
example, the main part of the internal format of my JMdict is in EUC,
and I can open it in a EUC-capable editor in 3 seconds. Opening the 
UTF8 version in something like Yudit takes about as long as making a cup
of coffee.

>> The second issue is to decide whether you are supporting localization
>> (ie, users are normally monolingual), or multilingualization (the user
>> community is multilingual, even if the users are not).  In the former
>> case, you just need to make sure you always do the conversion, and the
>> default external encoding can be a global setting.  In the latter
>> case, you need to strictly control which modules are allowed to do
>> I/O, because otherwise it's possible for different modules to get
>> conflicting ideas about what encodings are being used.  Furthermore,
>> if somebody later decides to do more sophisticated conversions etc,
>> they'll be chasing bugs forever as different parts of the program get
>> updated at different times because there's no complete list.
>> 
>> The third issue is message localization using gettext.  This has a
>> moderate number of tricky parts if you want to do it right (for
>> example, dealing with printf when the variable parts come in different
>> orders in different languages), but it's also something that you can
>> typically leave to a specialist, since these issues are normally
>> localized to each message.  That is, your program's architecture can't
>> make it harder or easier for the translation team.  However, if you
>> want to encourage L10N from the get-go, learn that stuff and provide
>> message catalogs.  There are lots of tools, and the suite in the
>> gettext package is quite complete.

All well put.

>> As long as you use a language (a p-language, for example) or toolkit
>> (GTK) that supports Unicode internally, you generally do not have to
>> worry about issues like font handling or input methods.  Those are
>> somebody else's problem.  ;-)

Many people think these are the problem, but I agree that if you have
done the earlier bits properly, fonts and inputs are a done deal.

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Computer Science & Software Engineering,                Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大学

Follow-Ups:
- Re: [tlug] i18n Primer
  - From: Stephen J. Turnbull

Prev by Date: Re: [tlug] Potentially Dying Hard Disk Questions [2](DroppingHD's...)
Next by Date: Re: [tlug] i18n Primer
Previous by thread: Re: [tlug] i18n Primer
Next by thread: Re: [tlug] i18n Primer
Index(es):
- Date
- Thread

Home | Main Index | Thread Index