Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Updating iconv tables



I have struck a problem with missing mappings in
iconv in several Linux distros. The problem has
arisen initially with ㈱ (i.e. (株)), but is sure crop
up with others.

An entry for ㈱ went into JMdict/EDICT recently. I
maintain the source for those files in EUC-JP and
distribute the text formats (EDICT/EDICT2) in EUC-JP
and the XML format (JMdict) in UTF8. I convert to
UTF8 using the iconv commandline utility (on a Solaris
system). For ㈱ the EUC (ADEA) converts to the UTF8
version of U+3231 fine. That EUC-JP codepoint is in
JIS X 0213, which you-all probably know is a sort-of
superset of the main JIS X 0208 standard.

The WWWJDIC servers also use EUC-JP internally, but
if a user has selected UTF8 operation will convert
input and display UTF8<->EUC on the fly using iconv().
This works fine on the Monash WWW server (also Solaris).

The trouble occurs on Linux systems. When people have
gone to convert the EDICT file to UTF8 for other
systems, such as GJiten and Kim's jisho.org the くそ
has hit the fan. The iconv utility simply dies on that
character, and the WWWJDIC servers that run on Linux,
which is most of them apart from Monash, fail to convert
㈱ properly.

The problem, I conclude, is with the compiled-in tables
in iconv in the Linux distros. It seems Sun has gone to
the trouble of keeping theirs up-to-date, but the standard
distros haven't.

The question is: how can this be fixed? Who do I whinge to?

I'll send a copy of this to bug-gnu-libiconv@example.com
but is that enough?

Cheers

Jim

-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links