
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] i18n Primer
- Date: Wed, 11 Aug 2004 10:26:37 +1000 (EST)
- From: Jim Breen <Jim.Breen@example.com>
- Subject: Re: [tlug] i18n Primer
"Stephen J. Turnbull" <stephen@example.com> wrote:
>> >>>>> "Josh" == Josh Glover <tlug@example.com> writes:
>> Josh> Will one of the i18n gurus on this list (Steve, Jim, et al.)
>> Josh> please recommend *the* i18n primer for software developers?
>> Josh> I need to remedy my glaring lack of knowledge in the area.
>>
>> There isn't one that I know of. Jim and I were going to write one,
>> but it hasn't happened yet.
Naruhodo. I am bearing a huge load of guilt because I have done very
little towards it.
>> Josh> Using Google, I found "Introduction to i18n", by Kubota
>> Josh> Tomohiro,[1] which I have printed off for reading. Is this a
>> Josh> good intro?
>>
>> No. It's written by a Japanese, which normally implies a quite warped
>> point of view toward i18n. Kubota is no exception. Nor am I (I may
>> not have been born Japanese, but my I18N baptism was in the Church of
>> Shit-JIS Mojibake).
I've read some of Kubota's stuff. It's not but, but a bit idiosyncratic
(but then, aren't we all?)
>> However, it may be the best there is in English. Suzuki et al have a
>> book in Japanese, but it's oriented to the specialist. O'Reilly's X
>> Window System series had an X11R5 update volume which introduced a
>> bunch of things reasonably well, but obviously it's heavily
>> X-oriented. And I don't know how that's been treated in the X11R6
>> editions. My article in the LJ in 1999 was too heavy on theory, not
>> very strong on practice, as was my chapter in Wrox's Professional
>> Linux Programming.
'Twas early days though. I things are a bit clearer now in a practical
sense.
>> Josh> I am trying to avoid writing software that is difficult to
>> Josh> internationalise, so I am looking to become familiar with
>> Josh> the basics of i18n.
>>
>> There are no basics of I18N, really. It's all advanced details.
And that's the hurdle we are trying to climb over. There is a heap of
stuff you have to do and do right.
>> However, if you plan to leave the details to others, it's not too
>> hard. The first principle is to convert to Unicode (if you're working
>> in a low-level language like C/C++, preferably widechars, not UTF-8,
>> so as to ensure that English doesn't work serendipitously if you use
>> the wrong API) as soon as possible, and do all internal string
>> processing in Unicode.
Um. I'm not so sure about this. I can think of many situations where
you can comfortably leave the internals in UTF8. The hit converting
UTF8->Unicode->UTF8 while working on a large file can be horrible. For
example, the main part of the internal format of my JMdict is in EUC,
and I can open it in a EUC-capable editor in 3 seconds. Opening the
UTF8 version in something like Yudit takes about as long as making a cup
of coffee.
>> The second issue is to decide whether you are supporting localization
>> (ie, users are normally monolingual), or multilingualization (the user
>> community is multilingual, even if the users are not). In the former
>> case, you just need to make sure you always do the conversion, and the
>> default external encoding can be a global setting. In the latter
>> case, you need to strictly control which modules are allowed to do
>> I/O, because otherwise it's possible for different modules to get
>> conflicting ideas about what encodings are being used. Furthermore,
>> if somebody later decides to do more sophisticated conversions etc,
>> they'll be chasing bugs forever as different parts of the program get
>> updated at different times because there's no complete list.
>>
>> The third issue is message localization using gettext. This has a
>> moderate number of tricky parts if you want to do it right (for
>> example, dealing with printf when the variable parts come in different
>> orders in different languages), but it's also something that you can
>> typically leave to a specialist, since these issues are normally
>> localized to each message. That is, your program's architecture can't
>> make it harder or easier for the translation team. However, if you
>> want to encourage L10N from the get-go, learn that stuff and provide
>> message catalogs. There are lots of tools, and the suite in the
>> gettext package is quite complete.
All well put.
>> As long as you use a language (a p-language, for example) or toolkit
>> (GTK) that supports Unicode internally, you generally do not have to
>> worry about issues like font handling or input methods. Those are
>> somebody else's problem. ;-)
Many people think these are the problem, but I agree that if you have
done the earlier bits properly, fonts and inputs are a done deal.
Jim
--
Jim Breen http://www.csse.monash.edu.au/~jwb/
Computer Science & Software Engineering, Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia Fax: +61 3 9905 5146
(Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学
Home |
Main Index |
Thread Index