Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] unicode and Perl- how to pass command line unicodearguments
- Date: Mon, 13 Feb 2006 15:36:20 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] unicode and Perl- how to pass command line unicodearguments
- References: <43EFF8C4.4050704@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b23 (daikon, linux)
>>>>> "David" == David Riggs <dariggs@example.com> writes: David> Somehow your suggested utf8::decode($x) only returns a David> "1", presumably for success, and I do not see how to get David> it to return the value. As David E points out, it's doing its work in place. Not good. David> Very mystifying. Not really, if you understand what's actually happening. The main thing is to disabuse yourself of the notion that anything that's useful for real programming work can "just work" with Unicode (or with anything; be thankful you only have to deal with Unicode and not IEEE 754 floating point!) The basic problem is that languages that have inherited their way of thinking about text from C always have an assumption that text == a region of memory built in, and strings are really just a collection of bytes. Then people get used to programming as though strings and byte arrays are the same thing, and you don't know what "this is text" means; is it an array of 8-bit integers, or is it a UTF-8 stream of characters of variable width? So all of these languages allow you to treat memory regions as strings, and it's the programmer's responsibility (this means YOU! ;-) to disambiguate. David> And I thought perl was supposed to just work with unicode! You might be happier with Python (or some other language with similar design). Python has separate types for byte strings and Unicode strings. Unicode literals are a bit of an annoyance since you have to do something like var = "Yes, this is valid UTF-8!".unicode('utf-8') but if you're generally reading from files you can set the default codec to the appropriate UTF, and you "just read" from the files and everything "just works." The basic principle is that all your workhorse functions should assume (and check for, if they can be called at higher levels) Unicode as input. Everything should be converted to Unicode _explicitly_ as early as possible. It's probably possible to program in this style in Perl, too, but Perl believes that anything that can't be implicit should be made so obscure that it might as well be implicit---it won't be pleasant. ;-) -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
- Follow-Ups:
- References:
- Re: [tlug] unicode and Perl- how to pass command line unicodearguments
- From: David Riggs
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] unicode and Perl
- Next by Date: [tlug] KuroBox HG Sid and Xterm not root problem
- Previous by thread: Re: [tlug] unicode and Perl- how to pass command line unicodearguments
- Next by thread: Re: [tlug] unicode and Perl- how to pass command line unicodearguments
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links