Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] What is a locale? [was: KDDI cellphones ...]



Scott Robbins writes:

 > Locales have always been a bit of a black box for me.  I understand
 > how to make them do what I want in general, but I don't really grok
 > their internals.

Ah, locales are not very complicated, although they include a huge
amount of details.

Basically, there are a number of things that vary according to
culture, language, and region.  Since all of those are more or less
associated with geography, the general concept is called locale.

The "things" are called categories, and the categories have names
beginning with "LC_" that double as environment variables.  Character
sets vary, even if you use Unicode (for example, collating order for
Han characters will depend on locale).  So there is a LC_CTYPE
category.  The standard 10-character dates (eg, 03/03/2007) also vary
between the US and Europe (in the US, that's March 3, 2007, whereas in
Europe it's 3 March 2007).  Of course the spelling of weekdays and
months vary.  Thus LC_TIME.

Inside a number of system functions, care is taken to look up the
current value for relevant categories.  So we have a function like
ctime.  It will check LC_TIME to see what format the time string
should be output in, and then LC_CTYPE for what character set.  You
can think of the value of LC_TIME as determining a printf
specification (more precisely, a strftime spec), and LC_CTYPE as
giving a character set's name, along with tables of collating order
and the like.

However, rather than give those directly, a database of locales is
kept, with common values for these.  The locales conventionally come
in the format language_REGION.charset@??
Obviously the charset portion determines the charset, but it's ignored
for most categories.  LC_TIME varies according to both language and
REGION (obviously ja will have kanji in the time string, while en
won't; however, en_US and en_GB may have different strings, as well).

This gets a little strained with settings like LC_PAPER.  What if you
want to set the standard size to B5 or legal?  You're out of luck,
since essentially all REGIONs map to either US letter or A4.  (And in
fact the POSIX committee decided not to standardize LC_PAPER,
presumably for this reason.)

The databases are more than a little yayakoshii because of the
irregularity of natural language, but that's not your problem; you can
just leave it to libc. :-)

Finally, if you have the tables in source form, you use localedef to
build tables that can be efficiently used by the system functions.
(There's no guarantee that localedef actually does anything, it would
be possible for libc to read and compile the tables lazily.)




Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links