TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Japanese regex question

Date: Mon, 29 Aug 2005 15:15:34 +0900

From: "Stephen J. Turnbull" <stephen@example.com>

Subject: Re: [tlug] Japanese regex question

References: <200508241701.55144.jq@example.com><20050825183913.O88704@example.com><200508251253.47083.jq@example.com><20050826113217.J88704@example.com><87zmr2me23.fsf@example.com><30ce843605082808003eac8faa@example.com><87y86mkrrg.fsf@example.com><20050828173528.796c3073@example.com>

Organization: The XEmacs Project

User-agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b21 (corn, linux)
>>>>> "Botond" == Botond Botyanszki <tlug@example.com> writes:

    Botond> I had the impression while coding in perl that it was
    Botond> handling text in unicode. And it seems to be the case
    Botond> according to the FAQ at
    Botond> http://rf.net/~james/perli18n.html#Q4

Could very well be.  I haven't done anything in Perl since
Hanshin-Awajishima Daishinsai (ie, about Feb 1 1995), so I don't
know.  However, perusing that FAQ suggests to me that the default is
unspecified unibyte ASCII superset, not UTF-8.  If you want to treat
the strings as Unicode you need to use special functions.  It looked
like you need to enable locale support rather than having it done
automatically.  Etc, etc.

In other words, it looks to me like by default Perl 5.6 supported I18N
oblivious programming, with minimal I18N being easy, but not default.

    Botond> # This means that in 5.005_50 or later:

Note that this followed a description of how to achieve the goal of
Unicode use; it isn't a statement that Perl was Unicode-aware out of
the box.

While I don't like Perl, this deficiency isn't why; nobody but X?Emacs
does this "right" by design and I admit that we implement it poorly.
The facilities for handling Unicode internally are excellent, etc, etc.

I'm just saying that it's regrettable that people are not encouraged
to do it right in new code and forced to remind themselves that they
are not using resources to support I18N by explicitly adding "use
LaxUnicode" or something like that to legacy code.  It's a subtle
difference that would prevent a lot of bugs of the kind Jonathan ran
into, while turning some old correct code into buggy code.

Is this a good tradeoff?  I think so.  As Guido van Rossum wrote on
python-dev yesterday, "Don't you think that the majority of code that
will be written in [language P] is in the future?"  And as Lyle Saxon
wrote "[We] always forget that we have the option to go backward."
Has Red Hat abandoned Python 1.5.2 yet?  ;-)

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.
Follow-Ups:

Re: [tlug] Japanese regex question
From: Ben K. Bullock

References:

[tlug] Japanese regex question
From: Jonathan Byrne

Re: [tlug] Japanese regex question
From: Tod McQuillin

Re: [tlug] Japanese regex question
From: Jonathan Byrne

Re: [tlug] Japanese regex question
From: Tod McQuillin

Re: [tlug] Japanese regex question
From: Stephen J. Turnbull

Re: [tlug] Japanese regex question
From: Ian Wells

Re: [tlug] Japanese regex question
From: Stephen J. Turnbull

Re: [tlug] Japanese regex question
From: Botond Botyanszki

Prev by Date: [tlug] Call for speakers for September TLUG meeting

Next by Date: Re: [tlug] GUI font tools

Previous by thread: Re: [tlug] Japanese regex question

Next by thread: Re: [tlug] Japanese regex question

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links