Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] i18n Primer



>>>>> "Josh" == Josh Glover <tlug@example.com> writes:

    Josh> Will one of the i18n gurus on this list (Steve, Jim, et al.) 
    Josh> please recommend *the* i18n primer for software developers? 
    Josh> I need to remedy my glaring lack of knowledge in the area.

There isn't one that I know of.  Jim and I were going to write one,
but it hasn't happened yet.

    Josh> Using Google, I found "Introduction to i18n", by Kubota
    Josh> Tomohiro,[1] which I have printed off for reading. Is this a
    Josh> good intro?

No.  It's written by a Japanese, which normally implies a quite warped
point of view toward i18n.  Kubota is no exception.  Nor am I (I may
not have been born Japanese, but my I18N baptism was in the Church of
Shit-JIS Mojibake).

However, it may be the best there is in English.  Suzuki et al have a
book in Japanese, but it's oriented to the specialist.  O'Reilly's X
Window System series had an X11R5 update volume which introduced a
bunch of things reasonably well, but obviously it's heavily
X-oriented.  And I don't know how that's been treated in the X11R6
editions.  My article in the LJ in 1999 was too heavy on theory, not
very strong on practice, as was my chapter in Wrox's Professional
Linux Programming.

    Josh> I am trying to avoid writing software that is difficult to
    Josh> internationalise, so I am looking to become familiar with
    Josh> the basics of i18n.

There are no basics of I18N, really.  It's all advanced details.

However, if you plan to leave the details to others, it's not too
hard.  The first principle is to convert to Unicode (if you're working
in a low-level language like C/C++, preferably widechars, not UTF-8,
so as to ensure that English doesn't work serendipitously if you use
the wrong API) as soon as possible, and do all internal string
processing in Unicode.

The second issue is to decide whether you are supporting localization
(ie, users are normally monolingual), or multilingualization (the user
community is multilingual, even if the users are not).  In the former
case, you just need to make sure you always do the conversion, and the
default external encoding can be a global setting.  In the latter
case, you need to strictly control which modules are allowed to do
I/O, because otherwise it's possible for different modules to get
conflicting ideas about what encodings are being used.  Furthermore,
if somebody later decides to do more sophisticated conversions etc,
they'll be chasing bugs forever as different parts of the program get
updated at different times because there's no complete list.

The third issue is message localization using gettext.  This has a
moderate number of tricky parts if you want to do it right (for
example, dealing with printf when the variable parts come in different
orders in different languages), but it's also something that you can
typically leave to a specialist, since these issues are normally
localized to each message.  That is, your program's architecture can't
make it harder or easier for the translation team.  However, if you
want to encourage L10N from the get-go, learn that stuff and provide
message catalogs.  There are lots of tools, and the suite in the
gettext package is quite complete.

As long as you use a language (a p-language, for example) or toolkit
(GTK) that supports Unicode internally, you generally do not have to
worry about issues like font handling or input methods.  Those are
somebody else's problem.  ;-)

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links