Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] unicode and Perl- how to pass command line unicodearguments



>>>>> "gabor" == gabor  <gabor@example.com> writes:

    gabor> in python byte-strings are objects and unicode-strings are
    gabor> objects too.  you create a byte string for example like
    gabor> this:

    gabor> string1 = "byte string"

Unfortunately, "これは日本語です。" will produce a string which is
encoded Japanese (with whatever encoding the file is saved in), but

    gabor> string2 = u"byte string"

u"これは日本語です。" does not produce Unicode-encoded Japanese.  It
may work with PEP 263 coding cookies, but this is unreliable in the
Japanese environment (because of the multiplicity of incompatible
encodings).  In fact, that only works as expected for Latin-1 IIRC (it
may not work for Latin-1 either).  That's why I chose the example
phrase I did.

I argued strenuously for an XML-like "default to UTF-8" policy with
optional codecs for loading Python code, but Guido refused on the
basis of backward compatibility (ie, lots of Europeans were using 8
bit encodings in existing production code).

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links