Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: recommendable email software]



Thanks to 新部さん for the correction.

Stephen J. Turnbull writes:

    >> First, if I understood a recent post (in Japanese) to the mule
    >> mailing list correctly, 

I didn't.  I took a close look at it later for XEmacs-beta; see

     http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/utf-2000.html

    >> Mule people are aiming at improving the
    >> internal Mule mbchar representation by using UTF-8.  This will
    >> have an important benefit to developers that Unicode can be
    >> handled directly without "kicking out" some existing charsets,
    >> and vastly extend the number of separate character sets that
    >> can be handled.  But it's clear that they intend to use UCS
    >> private space and map existing character sets into it, not use
    >> the BMP.

>>>>> "NIIBE" == NIIBE Yutaka <gniibe@example.com> writes:

    NIIBE> No, no.

    NIIBE> You might think that we are defining another kind of
    NIIBE> character set (like UCS) which also supports some
    NIIBE> characters from foreign character set(s) in private area...
    NIIBE> No, we don't take such an approach.  What we're doing is,
    NIIBE> say, 'Escape from Coded Character Set'.

    NIIBE> We'll use UTF-8 like representation of multi-byte
    NIIBE> string/buffer, however, it doesn't represent UCS or any
    NIIBE> existing character set(s) directly.  It will just encode
    NIIBE> array of integer.  The interpretation of integer is up to
    NIIBE> upper layer.  In the uppor layer, the integer refers a
    NIIBE> character.  The mapping of integer to character can be
    NIIBE> defined by _users_ as they like it.  In other words, we

That will take development of a lot of auxiliary tools, though; with
all the resources of Microsoft etc behind it, Unicode is just now
getting decent fonts.  So users will be dependent on existing fonts,
and you'll have to build smart tools to handle multi-byte fonts.  As
far as I know, there really isn't a good input method for Unicode (as
a multi-lingual system---of course for any given national language you 
can use a localized version with table-driven conversion).

This is very ambitious.

    NIIBE> offer the mechanism on which users can implement the
    NIIBE> policy.  If users want UCS system, they can (perhaps).

If I get this right, the idea is basically to provide a
space-efficient encoding of large (eg, 4-byte) integers as a basis for
a text-handling library, where the space-efficiency comes from the
fact that it is quite easy for localized software that needs only
handle ASCII (as the common foundation for internet protocols like
HTTP and mail) and (eg) Japanese to get the average encoded size of
characters down to under two bytes/character.  This comes without
giving up the flexibility to have (say) UCS-4 (eg, if your name is Jon
Babcock ;-).

Then a lot of work can be done on the library, since all character
sets have a common representation in the text object (buffer or
string), and the MBC-handling functions (search, sort, cursor motion,
cut, paste, case manipulations,... ) can be heavily optimized.

Display and input function will take place at a higher level (as they
already do in Mule, anyway), and will in general be efficiently
handled by the identity function (in the localized case) or
table-driven (in the multi-lingual case).

---------------------------------------------------------------
Next TLUG Meeting: 11 April Sat, Tokyo Station Yaesu gate 12:30
Featuring Tague Griffith of Netscape i18n talking on source code
---------------------------------------------------------------
a word from the sponsor:
TWICS - Japan's First Public-Access Internet System
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links