Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] unicode and Perl- how to pass command line unicodearguments



On 14/02/06, David Riggs <dariggs@example.com> wrote:
Ian said:

>Perl just says 'a string is a string of numbers' ---You still need to
>decide what's in your variable ----but only when you're doing
something >unusual

I don't think that working with kanji etc is "unusual", that's the
reason I am using computers-- no kanji and I would be back to the
typewriter. (with great great sadness!!)

 

>Admittedly, I've never seen anyone pass unicode parameters before, so
>that's a new problem...


Really! I guess I am a bit odder even than I thought.

Nah, I write code for people and command lines give them The Fear nowadays.  Also, I speak English (properly ;-) and so coding systems have passed me by for a lot of my coding life.

BTW, I also see that in perl that a file name (dragged in via @example.com
command line globbing) that has a kanji name has to be explicitly
translated.

Don't use Perl on Windows, that's all I'm saying.

I am finally realizing the truth of what you folks have been saying: you
have to tell perl what the variable/data encoding is in each case.

That's not really true.  The *variables* are fine.  Any interface between the variable and the outside world is where it goes wrong:

Arguments: come in as bytestrings, because you can't tell the encoding in advance
Filenames: ditto
Files: come in by default as ISO-Latin-1 (or, if you like to think of it that way, an unspecified 8 bit encoding), because that's what people generally expect
Source code: ditto, but 'use utf8' is thankfully nice and easy
Databases: varies by database despite having a standard interface to them, because databases can encode themselves differently
...and so on.  Python has exactly these same issues to contend with, and they seem to both act in the same way for the most part.
 
Thr reason I was being argumentative with Steve and Python is that while I can see that Python has the same problem with source code encoding, from what he's saying it seems to take the approach that the encoding is set string by string, rather than setting for the whole (or remainder of) the file.  Is that what you meant, Steve? 

in Perl, I don't have to ever specify u"string".  This is a good thing, in my opinion, because I want strings to be stored as decoded (once I've set the source file coding) and not as binary data 99% of the time, and I'm prepared to use \x.. for the other 1%.  Does Python approach I/O the same way, or can you set the coding system for an entire file rather than specifying for each read?

Perl may also like utf variable names when the file encoding is set.  I've not been sick enough to try...

--
Ian.

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links