Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: Piping stderr?



 Let me merge the threads.

At 28 Jun 2002 19:38:59 +0900,
Stephen J. Turnbull <stephen@example.com> wrote:

> But we're not talking about programs.  We're talking about the whole
> system.  CSI programs can do _nothing_ without the CSI library.  I
> just choose to use iconv(1,3) instead of something else.  Why are you
> allowed to use libcsi, but you won't let me even use standard features
> of libc?

 As you may mentioned, all CSI related functions(mb*/wc*) are
standard features.  Almosts are defined in XPG4(a few may in XPG5).
While iconv is XPG5.  It means that a system has CSI related functions,
if the system has iconv.  You don't need to libcsi :-).

> So there is no loss whatsoever to writing the main program to handle
> UTF-8, and only UTF-8.

 Again I said, it may occur the codepint can't map to UTF-8.
Then UTF-8 hard-coding program can't handle that.

>> Separating codset dependent part from programs is the point,
> 
> Not to me.  To me the point is _supporting character sets_ for the
> user, while avoiding any _branches on coded character set_ in the
> program's logic to simplify the programmer's job.

 Yes, "supporting character sets" is the aim, CSI is a mean.

 CSI is just like unicses, and each codesets are devices.
Accessing each devices are abstracted by API(open/read/write etc).
On the other hand hard-coding programs are like programs using ioperm(2).
UTF-8 may be USB device, but it's not the one.

At 28 Jun 2002 20:47:01 +0900,
Stephen J. Turnbull <stephen@example.com> wrote:

>> For example, russian characters have 2 width in EUC-JP, but
>> in Unicode it's 1.  If programs knows, original encoding, it
>> can correct that information.
> 
> So much for CSI.  :-)

 huhh, Yes if UTF-8 is the only codesets, we don't need CSI.
But the point is we have already several codesets.

> Do you have an URL which standardizes this CSI API?  All Google turned
> up was a bunch of flamewars, with the Sun people saying "but we've
> already implemented CSI, and gotten it certified by the Chinese, so
> we'd have to have multiple binaries," and Markus Kuhn and Bruno Haible
> saying, "yeah, well, why shouldn't the rest of us take the opportunity
> to standardize on a single sane coded character set, and add the
> necessary properties to that standard"?

 It's defined in POSIX, check the OpenGroup's locale realated pages.

> It seems CSI is basically about sucking up to the Chinese and other
> nationalists, providing standard heuristics for Japanese TTYs
> (wcwidth), and fighting off the Microsoft dragon, while pretending to
> be "more general".

 No.  To absract codesets, that's all.
Even on UTF-8 codeset, we still need to measure width.
Actually, uxterm ueses wcwidth or similar function.

> How about the users, who want to use characters, not code points?  And
> the programmers, who would like to be able to stop worrying about the
> damage that characters they don't know about can do to their file
> system (for example)?

 Strip such locales, that's all.

 Let me show you the simple CSI program
#let me forget about error handling ;-)

-------------8<-------------8<-------------8<-------------
#include <locale.h>
#include <wctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(void)
{
	wchar_t wc;             /* slipper */
	char buffer[16];        /* getabako */

	setlocale(LC_ALL,"");   /* magic spell */

	read(0,buffer,16);      /* shoes in getabako */
	mbtowc(&wc,buffer,MB_CUR_MAX);  /* Genkan  */
	printf("%lc\n",wc);     

	return 0;
}
-------------8<-------------8<-------------8<-------------

 And wchar_t on linux is happend to be UCS-4.
I think it satisfies almost what you want.
-- 
Jiro SEKIBA | Web tools & AP Linux Competency Center, YSL, IBM Japan
            | email: jir@example.com, jir@example.com


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links