Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] unicode and Perl- how to pass command line unicodearguments



Stephen J. Turnbull wrote:
>>>>>>"gabor" == gabor  <gabor@example.com> writes:
> 
> 
>     gabor> in python byte-strings are objects and unicode-strings are
>     gabor> objects too.  you create a byte string for example like
>     gabor> this:
> 
>     gabor> string1 = "byte string"
> 
> Unfortunately, "これは日本語です。" will produce a string which is
> encoded Japanese (with whatever encoding the file is saved in), but
> 
>     gabor> string2 = u"byte string"
> 
> u"これは日本語です。" does not produce Unicode-encoded Japanese.  It
> may work with PEP 263 coding cookies, but this is unreliable in the
> Japanese environment (because of the multiplicity of incompatible
> encodings). 

could you explain this part to me? why is your own source-code 
unreliable? :)

for example, this works fine:
=======
#!/usr/bin/python
# -*- coding: utf-8 -*-

text = u"これは日本語です"
print len(text)
========

the output is 8.



 > I argued strenuously for an XML-like "default to UTF-8" policy with
 > optional codecs for loading Python code, but Guido refused on the
 > basis of backward compatibility (ie, lots of Europeans were using 8
 > bit encodings in existing production code).
 >

hmm.. i would also prefer to use utf8 as the default instead of ascii..

btw. even for people who use latin-1, it does not help. without that 
pep263-setting,
auto-converting a latin-1 bytestring to unicode will end with an exception.

gabor


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links