Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tlug: Multilingual input (was Re: Japanese input)



>>>>> "Klaus" == Klaus Kudielka <kudielka@example.com> writes:

    Klaus> I renamed the subject again ;) The system I am dreaming of
    Klaus> should handle as many languages as possible, without having
    Klaus> to re-start applications (ideally, they shouldn't even know
    Klaus> about the language/charset/encoding etc. if they don't
    Klaus> like).

Keep dreaming.  If your application can do that, then it will have to
abandon all hope of interchange with the majority of current
applications (although by focusing on M$ compatibility, you will
probably be able to interchange the majority of documents).

    Klaus> 1. Shall the {input API, display API, C library multi-byte
    Klaus> functions} support arbitrary encodings (specified by the
    Klaus> locale) or just one universal encoding (e.g. UTF-8)? 3
    Klaus> questions, but, to keep application complexity low, there
    Klaus> should be only 1 answer.

To keep application complexity low, it must be UCS-2 (or its trivial
extension to the BMP of UCS-4), perhaps using UTF-8.  But see above.

    Klaus> 2. For the multi-byte conversion functions, matters are
    Klaus> even more complex. Does the wchar_t specify the Universal

Multi-byte does not mean "wide character."  It means "variable width"
character.  Sorry about that, but this usage is very well established
now.

    Klaus> Character Set (UCS) or the local character set (i.e. the
    Klaus> one corresponding to the encoding)? I assume it's UCS,

Wrong.  wchar_t doesn't give you _anything_; there's no guarantee it's 
even more than one byte wide.  (In practice it's normally a CARD16 ==
unsigned short for most C compilers.  However some systems use CARD32
or CARD64, so you will have to make sure you handle those cases.)

    Klaus> since some encodings (like ISO-2022-JP-2) can have multiple
    Klaus> (overlapping?) character sets. The Single UNIX
    Klaus> Specification, Version 2, only states that the behaviour is
    Klaus> affected by the LC_CTYPE category, but not in what way.
    Klaus> Does anyone have more information?

It's locale-specific.  Look in /usr/share/i18n/.

    Klaus> 3. Concerning (1) and (2), what's the status/direction for
    Klaus> the current APIs?

    Klaus>    - XIM

XIM (and its output complement, XOM) handle both wide-character and
multibyte encodings based on the POSIX locale model.  It will
presumably stay that way, now that X has divided into free and
proprietary development streams.

    Klaus>    - GNOME
    Klaus>    - glibc (I know the status of 2.0.7: UCS-4 <-> UTF-8 only)

    Klaus> When we have decided upon this issue, we can actually start
    Klaus> hacking on the various subsystems.

Why not join the groups that already are working on this?  Mule at ETL 
and FSF, XEmacs (very desultory at the moment, though), GNOME,
XFree group, whatever?

--------------------------------------------------------------
Next TLUG Meeting: 13 June Sat, Tokyo Station Yaesu gate 12:30
Featuring Stone and Turnbull on .rpm and .deb packages
Next Nomikai: 17 July, 19:30 Tengu TokyoEkiMae 03-3275-3691
After June 13, the next meeting is 8 August at Tokyo Station
--------------------------------------------------------------
Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links