Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] unicode and Perl- how to pass command line unicodearguments



>>>>> "Ian" == Ian Wells <ijw@example.com> writes:

    Ian> Thr reason I was being argumentative with Steve and Python is
    Ian> that while I can see that Python has the same problem with
    Ian> source code encoding, from what he's saying it seems to take
    Ian> the approach that the encoding is set string by string,
    Ian> rather than setting for the whole (or remainder of) the file.
    Ian> Is that what you meant, Steve?

No.  In Python 2.x, there is a natural language text object, which for
historical reasons is called "Unicode" and whose literals are denoted
u"string".  Then there is raw memory, which for historical reasons is
called "string" and whose literals are denoted "string".  For
historical reasons, the raw memory object has continued to be heavily
abused as a container of natural language text.  What I don't like
about Perl, as I understand your description, is that Perl mandates
that abuse (whatever happened to "there's always more than one way to
do it"? :-)

As Gabor pointed out, there is a flexible way of making Python as
DWIM-witted as Perl.  You can set the encoding for the file in the way
which has become common for many text editors (include Emacsen and
IIRC vim), by putting a specially-formatted comment (aka coding
cookie) at the top of the file.

    Ian> in Perl, I don't have to ever specify u"string".  This is a
    Ian> good thing, in my opinion, because I want strings to be
    Ian> stored as decoded (once I've set the source file coding) and
    Ian> not as binary data 99% of the time, and I'm prepared to use
    Ian> \x.. for the other 1%.

But according to you, this is exactly what Perl doesn't do.  It
decodes the text, then stores it as binary data, and depends on you to
not do something stupid.  This can work, but (a) it depends on
programmer discipline and (b) is modal.  Ie, the "use utf8;"
declaration is at the top of the file which the programmer may or may
not ever look at carefully.

In fact I would guess that it might actually be in some other file
entirely, since it's part of the language.  The Python cookie can't be
in another file, since it only refers to the text of the file
currently being read.  Whether Both approaches have serious problems.
Python's is more readable IMO, but the "convert at variable
initialization" approach is the most readable (though verbose).

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links