TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Japanese regex question

Date: Sun, 01 Jan 2006 23:30:30 +0100

From: Gábor Farkas <gabor@example.com>

Subject: Re: [tlug] Japanese regex question

References: <200508241701.55144.jq@example.com> <20050825183913.O88704@example.com> <200508251253.47083.jq@example.com> <20050826113217.J88704@example.com> <87zmr2me23.fsf@example.com>

User-agent: Thunderbird 1.5 (Macintosh/20051201)
Stephen J. Turnbull wrote:
>>>>>> "Tod" == Tod McQuillin <devin@example.com> writes:
> 
>     Tod> Yeah but the regex engine doesn't know it's not ascii.
> 
> Urk.  "Unidentified unibyte ASCII-superset", if you please!
> 
>     Tod> Unless you use unicode, it will interpret the strings as
>     Tod> strings of 8-bit bytes, not as non-ascii multibyte
>     Tod> characters.
> 
> Nice call!  For those of you who haven't thought carefully about it
> yet, those matching 4/6 and 5/7 first-nibble pairs in the ambiguous
> match positions are a dead giveaway.
> 
> We had a post on this kind of issue (ambiguous matches in UTF-8) a
> couple months back, too.   It's worth trying to remember this one.
> 
>     Tod> Probably the only proper way to do this is to convert
>     Tod> everything to unicode first.
> 
> This is all so stupid.  XEmacs has been doing this (badly) for almost
> a decade, Mule for another 3 or 4 years longer than that.  Why Perl
> and Python failed to seize the opportunity to do it right when they
> added Unicode support I'll never know.
> 

sorry to jump in so late...

could you please describe to me what is Python doing wrong regarding 
unicode?

thanks,
gabor

-- 
Flexibility is overrated
Constraints are liberating
-- David Heinemeier Hansson, Secrets behind Ruby on Rails
Follow-Ups:

Re: [tlug] Japanese regex question
From: Stephen J. Turnbull

Prev by Date: Re: [tlug] [tlug-digest] Mozilla printing. No joy. Isn't there somegood Mozilla doc about printing?

Next by Date: [tlug] O3: The Open Source Enterprise Data Networking Magazine

Previous by thread: Re: [tlug] [tlug-digest] Mozilla printing. No joy. Isn't there somegood Mozilla doc about printing?

Next by thread: Re: [tlug] Japanese regex question

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links