Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tlug: Re: BTW, what is a "BMPstring"?



>>>>> "Sanjay" == Sanjay Agnani <s.agnani@example.com> writes:

    Sanjay> I think BMP (Basic Multilingual Plane) string is basically
    Sanjay> a Unicode (Universal Coded Character Set-2 -> UCS-2)
    Sanjay> string in 16-bit encoding in native processor endianness.

Well, in that case you need to translate to JIS (either EUC or
ISO-2022-7 compatible, depending on your console's capabilities)
first.  This will have to be table-driven, although on Sloaris you may
get lucky and have a system utility call for that.  No such luck on
Linux, not until glibc 2.2 and possibly later IIRC.  The tables are
available for download at ftp.unicode.org.

You'll probably want to be defensive about people using UTF-16
surrogates (non-Unihan Japanese kanji will be up there; people will
want to use their proper name and address characters).  You may want
to strip out private space characters.  One alternative in both cases
is to use the geta mark (looks like a fat equals sign; the JIS X 0208
equivalent of U+FFFD) as a substitute.  You may also want to strip
out/substitute everything that doesn't code directly to JIS.  I'm not
sure what happens with JIS Greek, Cyrillic, etc, be careful there.  (I 
think that since these don't violate the source separation rule, they
get unified.  But you will want to reverse translate them to JIS, and
I don't know if the Unicode tables do that by default.)

Oh, and forget about "printf".  \0 is a valid (and extremely common)
byte in Unicode (every ISO-8859-1 character has that in the upper
byte, right?)  You'll need wprintf() and friends, which I don't know
if they work in glibc 2.1, and are implemented idiosyncratically in
glibc 2.2.  (Ie, you'll probably need to have several levels of
#ifdefs, one for each libc---several flavors of glibc, as glibc
developers don't care if they break your programs, plus at least one
for Sloaris.)  Be very careful to keep the use of widechar output
functions extremely localized; use inline functions or macros if
efficiency is important.

- 
University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
__________________________________________________________________________
__________________________________________________________________________
What are those two straight lines for?  "Free software rules."
-------------------------------------------------------------------
Next Nomikai: September 17 (Fri), 19:30 Tengu TokyoEkiMae 03-3275-3691
*** Linux 8th Birthday Anniversary! ***
Next Technical Meeting: October 9 (Sat), 13:00     place: Temple Univ.
*** Topics: 1) Linux i18n   2) Japanese TrueType fonts
-------------------------------------------------------------------
more info: http://www.tlug.gr.jp        Sponsor: Global Online Japan


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links