Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: Piping stderr?

At 25 Jun 2002 22:07:36 +0900,
Stephen J. Turnbull <> wrote:
>> And uxterm's text processing depends on UTF-8 code set, I
>> can say.
> Sure, but all that matters is (1) chunking characters in the buffer
> and (2) converting to integers for font indexing, both operations are
> trivial and efficient.  This is very different if you want to handle
> all of EUC and Shift JIS and ISO-2022-JP.  In Shift JIS you can't even
> handle all Japanese characters; EUC is the best of the three and
> reasonably consistent but what if you want to write Hangul or Hrvtska
> (Croatian); and ISO-2022-JP means you don't even know where characters
> begin and end without going all the way the back to the beginning.

 EUC-JP can't handle Hangul or Hrvatska or other codeset, of cource.
It's not a point.   If the programs assume the codeset, the program
only handle the codeset.  Even if the codeset is UTF-8, the situation
is not changed.  UTF-8 supports lots of character used in world wide,
but not perfect at all.

> Why burden a stupid little *term with that complexity?

 For me, interpreting Unicode is burden as well.
We have I18N related APIs and you the programs obtain
character information through the APIs without thinking
what kind of codeset the program interpreting.  Less burden I think.

> It should be like a Japanese house.  You may wear sandals, shoes,
> sneakers, or boots outside, but when you enter the house you take off
> your outdoor shoes and put on slippers.  In the same way, at the time
> text enters a program, it should be converted into a universal
> representation, ie, Unicode.  It doesn't matter whether the getabako
> is inside the program (as the genkan in most homes) or outside (as at
> my daughter's school).

 But filter is not always perfect.  SJIS can't round trip UTF-8
(e.g 0x5C) as you know.  It's like,  you get home and take the
shoes off, later you try to get out with the same shoes, but left
shoe is stolen ;-).

 And more, in future it is very possble that codeset which can't map into
UTF-8.  Then UTF-8 hard coded software can't handle the text written in
the codeset.  it's like, when new-comer comes to your hause, his/her feet
are too tall to put your hause's slipper ;-).

Jiro SEKIBA | Web tools & AP Linux Competency Center, YSL, IBM Japan
            | email:,

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links