Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: Piping stderr?



>>>>> "Jiro" == Jiro SEKIBA <jir@example.com> writes:

    Jiro> Again I have to say that Unicode hard-coding programns only
    Jiro> suppourt Unicode charset.

But we're not talking about programs.  We're talking about the whole
system.  CSI programs can do _nothing_ without the CSI library.  I
just choose to use iconv(1,3) instead of something else.  Why are you
allowed to use libcsi, but you won't let me even use standard features
of libc?

    Jiro> On the other hand, CSI have potential to support other
    Jiro> charset as long as system support.  So I'll conclude what
    Jiro> you are talking can't be CSI.

As long as I've got iconv on my system, my UTF-8 programs support as
many coded character sets as anybody does.  With the added bonus that
it supports _all_ known characters _simultaneously_.  If I write my
program in the form

int main (int argc, char* argv[]) {
  return main_1(argc, argv);
}

int main_1 (int argc, char* argv[]) {
  /* UTF-8-only program goes here */
}

I can even wrap main_1 with codecs on the stdio streams and arguments,
so that you'll never see that you aren't using a CSI program.  If the
needed APIs are available in CSI, I can use them.  If not, they're not
that hard to write in terms of iconv(3), depending on how
sophisticated the desired error handling is.

So there is no loss whatsoever to writing the main program to handle
UTF-8, and only UTF-8.

    Jiro> Separating codset dependent part from programs is the point,

Not to me.  To me the point is _supporting character sets_ for the
user, while avoiding any _branches on coded character set_ in the
program's logic to simplify the programmer's job.

I don't see how to accomplish that without placing restrictions on the
internal representation.  I see how to do it trivially if I restrict
the internal representation to be Unicode, with the added bonus that I
can use all the accumulated knowledge about how to do text processing
(such as regexps) in Unicode, while the CSI approach has to be a lot
more general since it has no idea about the structure of the codes in
the arrays of wchar_t.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
 My nostalgia for Icon makes me forget about any of the bad things.  I don't
have much nostalgia for Perl, so its faults I remember.  Scott Gilbert c.l.py


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links