Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] unicode and Perl- how to pass command lineunicode arguments



Hi David,

I realize there has already been some discussion on this point, but you
were originally headed in the right direction. The only problem was your
usage of -C.

From perlrun:

       -C [number/list]
            The "-C" flag controls some Unicode of the Perl Unicode features.

            As of 5.8.1, the "-C" can be followed either by a number or a list of option letters.  The letters, their
            numeric values, and effects are as follows; listing the letters is equal to summing the numbers.

                I     1    STDIN is assumed to be in UTF-8
                O     2    STDOUT will be in UTF-8
                E     4    STDERR will be in UTF-8
                S     7    I + O + E
                i     8    UTF-8 is the default PerlIO layer for input streams
                o    16    UTF-8 is the default PerlIO layer for output streams
                D    24    i + o
                A    32    the @example.com elements are expected to be strings encoded in UTF-8
                L    64    normally the "IOEioA" are unconditional,
                           the L makes them conditional on the locale environment
                           variables (the LC_ALL, LC_TYPE, and LANG, in the order
                           of decreasing precedence) -- if the variables indicate
                           UTF-8, then the selected "IOEioA" are in effect

You missed A. IMHO, you should just use -C127 (enables all of the above)
in a kanji/unicode heavy program because it simply makes everything
unicode aware (except for unicode in the script, for which you still
need the utf pragma) and that will cut down on accidental encoding
problems.

Neil


On Sun, 2006-02-12 at 17:30 +0900, David Riggs wrote:
> Does anyone know how to pass real unicode kanji to perl on the command 
> line? (Not just bytes that appear as kanji but are passed on as bytes.)
> 
> I finally magaged to get perl to do unicode work by saying 1. use utf8; 
> (to have unicode in the script), and 2. invoding with -CSio switch (to 
> do I/O in unicode). With these I can finally manipulate kanji (i.e. use 
> tr/// to tranlate from one to another kanji and s/// to do real 
> character classes and such).
> 
> But perl still reads the command line arguments as bytes, and they get 
> mangled in the script. I just want to pass it a kanji string. I am 
> limping along by reading from <STDIN>, but that messes up my abiility to 
> make a pipeline of perl scripts.
> 
> Thanks for any help,
> 
> David Riggs, Kyoto
> 

Attachment: signature.asc
Description: This is a digitally signed message part


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links