Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Japanese regex question
- Date: Mon, 29 Aug 2005 15:15:34 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Japanese regex question
- References: <200508241701.55144.jq@example.com><20050825183913.O88704@example.com><200508251253.47083.jq@example.com><20050826113217.J88704@example.com><87zmr2me23.fsf@example.com><30ce843605082808003eac8faa@example.com><87y86mkrrg.fsf@example.com><20050828173528.796c3073@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b21 (corn, linux)
>>>>> "Botond" == Botond Botyanszki <tlug@example.com> writes: Botond> I had the impression while coding in perl that it was Botond> handling text in unicode. And it seems to be the case Botond> according to the FAQ at Botond> http://rf.net/~james/perli18n.html#Q4 Could very well be. I haven't done anything in Perl since Hanshin-Awajishima Daishinsai (ie, about Feb 1 1995), so I don't know. However, perusing that FAQ suggests to me that the default is unspecified unibyte ASCII superset, not UTF-8. If you want to treat the strings as Unicode you need to use special functions. It looked like you need to enable locale support rather than having it done automatically. Etc, etc. In other words, it looks to me like by default Perl 5.6 supported I18N oblivious programming, with minimal I18N being easy, but not default. Botond> # This means that in 5.005_50 or later: Note that this followed a description of how to achieve the goal of Unicode use; it isn't a statement that Perl was Unicode-aware out of the box. While I don't like Perl, this deficiency isn't why; nobody but X?Emacs does this "right" by design and I admit that we implement it poorly. The facilities for handling Unicode internally are excellent, etc, etc. I'm just saying that it's regrettable that people are not encouraged to do it right in new code and forced to remind themselves that they are not using resources to support I18N by explicitly adding "use LaxUnicode" or something like that to legacy code. It's a subtle difference that would prevent a lot of bugs of the kind Jonathan ran into, while turning some old correct code into buggy code. Is this a good tradeoff? I think so. As Guido van Rossum wrote on python-dev yesterday, "Don't you think that the majority of code that will be written in [language P] is in the future?" And as Lyle Saxon wrote "[We] always forget that we have the option to go backward." Has Red Hat abandoned Python 1.5.2 yet? ;-) -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
- Follow-Ups:
- Re: [tlug] Japanese regex question
- From: Ben K. Bullock
- References:
- [tlug] Japanese regex question
- From: Jonathan Byrne
- Re: [tlug] Japanese regex question
- From: Tod McQuillin
- Re: [tlug] Japanese regex question
- From: Jonathan Byrne
- Re: [tlug] Japanese regex question
- From: Tod McQuillin
- Re: [tlug] Japanese regex question
- From: Stephen J. Turnbull
- Re: [tlug] Japanese regex question
- From: Ian Wells
- Re: [tlug] Japanese regex question
- From: Stephen J. Turnbull
- Re: [tlug] Japanese regex question
- From: Botond Botyanszki
Home | Main Index | Thread Index
- Prev by Date: [tlug] Call for speakers for September TLUG meeting
- Next by Date: Re: [tlug] GUI font tools
- Previous by thread: Re: [tlug] Japanese regex question
- Next by thread: Re: [tlug] Japanese regex question
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links