Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Why change a linux server's locale?



マスターズ・イアン writes:

 > During the time I've been working in Japan (over 15 years), almost
 > exclusively in Linux, I've come across many attempts to "push"
 > Linux into ... Japanese language encodings such as euc and
 > sjis.

There are many inconveniences involved in working with modern software
such as Python 3 and even glibc in Japan.  For example, infozip
utilities take the standard seriously, and assume that Shift-JIS-
encoded file names are ISO-8859-1-encoded.  But many utilities in
common use produce such zipfiles.  And glibc is specifically a POSIX
system, and takes POSIX locales seriously.  It leaks through
sometimes.

 > It's never been a huge problem, as far as I can remember, but I
 > really wonder why it's necessary, given that:

 > 1. ja_JP.UTF-8 supports Japanese completely, and

But it does not!  Japanese is a language, not a coded character set.
Japanese users use EUC, Shift JIS, ISO-2022-JP-some-corporate-variant,
and UTF-8 quite catholicly, and expect that sewer sludge to be
transparent.  Japanese is also a culture where local custom is far
more important than Internet or even national standards, especially
where "local" is spelled c-o-r-p-o-r-a-t-e (thus the proliferation of
"corporate variants" of the JIS character repertoire).  Most corporate
repertoires are subsets of the modern JIS (and therefore Unicode)
repertoires, but they sometimes disagree on what codes are assigned to
those characters.  It's a huge mess, even today, though much less
likely to cause substantial delays or misunderstandings than 30 years
ago.

For example, my university considers itself to be the MIT of Japan,
yet many of its internal pages are "enterprise software" displaying
partial mojibake due to use of iframes and other malware.  (They
assume use of a Japan-localized version of IE -- I don't think they
really support Edge yet -- which prefers automatic recognition of
coded character sets to MUST and REQUIRED features of the HTTP and
HTML standards.  So it just works, unless you have a standard-
conforming browser, when the latter is kinda a good thing in today's
insecurity environment. :-þ)

 > 2. Most things that go on inside a database are unaffected by the OS's locale

 > So, my questions are ...

 > 1. Do you know of any reason I should be worried about the fact
 >    that our development server's locale is ja_JP.UTF-8 but the
 >    customer's is ja_JP.sjis (which isn't even supported on Red Hat
 >    Enterprise Linux)?

"Worried," no.  Expect occasional annoyances and bill accordingly.

Specifically, any customers-of-customer-facing server probably is OK.
As others point out, these usually are pretty good about handling text
internally as Unicode and spitting it out in the client browser's
preferred coded character set.  (I assume you would know if that's not
true!)  On the other hand, internal-use software often assumes that it
knows what the character set is, and that it may be inferred from the
locale.  Usually this is not a problem (most people know only one
language and that one not so well ;-þ, so multilingual environments
are uncommon, and roundtripping is a central design principle of
Unicode).  Somebody just needs to transcode when shipping stuff
between systems.  Likely both the dev server and the production server
will work fine in their own environments.  You don't say, but I guess
that you aren't the admin of the customer's server, while the admins
are Japanese and the corporate culture is Shift-JIS.  The problem is
going to be communication between devs and admins, eg, passing around
zip files and perhaps scripts where their server uses a autodetecting
"jgrep" that handles Shift JIS and ISO-2022-JP while you have a
vanilla GNU grep that expects ASCII-compatible ISO-8859 or UTF-8.  And
there are potential problems with the file system if file names need
to be input from the console (again, web apps are generally more
robust).

 > 2. When have you found it absolutely imperative to have a Linux
 >    server with sjis locale?

Never.  In fact, it's likely to get in the way of your own work.  The
communication problems mentioned above are annoying, but less so than
the infelicities of dealing with Shift JIS during daily work.







Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links