Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: Piping stderr?

>>>>> "Jiro" == Jiro SEKIBA <> writes:

    Jiro> UTF-8 supports lots of character used in world wide, but not
    Jiro> perfect at all.

Be concrete.  I don't know of any major missing character sets or
characters (that aren't scheduled or proposed for addition).
Admittedly there are political problems (such as the influential
Nikkei minorities in Canada, Mexico, and Finland whose national
character sets look remarkably like IBM kanji; and the "Ukrainian
problem" where the Russians on the USSR standards committee didn't see
fit to submit Cyrillic characters only used in Ukrainain).

But these are nothing compared to "Muneo House" or "but I didn't
inhale".  ;-)

In any case, either way effort has to be made to support those
characters internally.  Why not devote that effort to getting them
into Unicode, then subclassing Unicode to handle any special
properties they have?

    Jiro> Less burden I think.

For the programmer, when it works.  Consoles, shells, and scripts
should not depend on such complexity, because when (_not if_) it
breaks, it can take the whole system down.

Also, in case you haven't noticed, the Internet and information
systems generally have become a decidedly more hostile environment.
Did you know that UTF-8 was respecified in Unicode 3.1 _for security
reasons_?  How does CSI I18N handle the security issues involved in
delegating text handling to user-provided routines, etc?  My bet is
"not at all".

    Jiro> But filter is not always perfect.  SJIS can't round trip
    Jiro> UTF-8 (e.g 0x5C) as you know.  It's like, you get home and
    Jiro> take the shoes off, later you try to get out with the same
    Jiro> shoes, but left shoe is stolen ;-).

Since when?  Since Unicode includes all characters in JIS, that means
Shift JIS can't round trip JIS, either.  Wouldn't surprise me, but as
far as I know that's not true.  You just have to use the right mapping.

    Jiro> And more, in future it is very possble that codeset which
    Jiro> can't map into UTF-8.

Mojikyo?  That's not a character set, that's a glyph set.  Not to
mention that it's nonstandard and nastily proprietary (the UTF-2000
people were forced to remove mojikyo support from their version of
XEmacs).  And there is plenty of room for a thousand Mojikyos in
UCS-4.  It won't be Unicode-conformant, but upward compatible.

Other than that, there are no efforts I know of.  Again, be concrete.

Institute of Policy and Planning Sciences
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
 My nostalgia for Icon makes me forget about any of the bad things.  I don't
have much nostalgia for Perl, so its faults I remember.  Scott Gilbert

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links