TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Japanese regex question

Date: Sun, 28 Aug 2005 21:38:44 +0900

From: "Stephen J. Turnbull" <stephen@example.com>

Subject: Re: [tlug] Japanese regex question

References: <200508241701.55144.jq@example.com><20050825183913.O88704@example.com><200508251253.47083.jq@example.com><20050826113217.J88704@example.com>

Organization: The XEmacs Project

User-agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b21 (corn, linux)
>>>>> "Tod" == Tod McQuillin <devin@example.com> writes:

    Tod> Yeah but the regex engine doesn't know it's not ascii.

Urk.  "Unidentified unibyte ASCII-superset", if you please!

    Tod> Unless you use unicode, it will interpret the strings as
    Tod> strings of 8-bit bytes, not as non-ascii multibyte
    Tod> characters.

Nice call!  For those of you who haven't thought carefully about it
yet, those matching 4/6 and 5/7 first-nibble pairs in the ambiguous
match positions are a dead giveaway.

We had a post on this kind of issue (ambiguous matches in UTF-8) a
couple months back, too.   It's worth trying to remember this one.

    Tod> Probably the only proper way to do this is to convert
    Tod> everything to unicode first.

This is all so stupid.  XEmacs has been doing this (badly) for almost
a decade, Mule for another 3 or 4 years longer than that.  Why Perl
and Python failed to seize the opportunity to do it right when they
added Unicode support I'll never know.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.
Follow-Ups:

Re: [tlug] Japanese regex question
From: Ian Wells

Re: [tlug] Japanese regex question
From: Ben K. Bullock

References:

[tlug] Japanese regex question
From: Jonathan Byrne

Re: [tlug] Japanese regex question
From: Tod McQuillin

Re: [tlug] Japanese regex question
From: Jonathan Byrne

Re: [tlug] Japanese regex question
From: Tod McQuillin

Prev by Date: Re: [tlug] Japanese fonts don't look so hot in FireFox/CentOS

Next by Date: [tlug] Mozilla 1.8a3 AdBook Problem

Previous by thread: Re: [tlug] Japanese regex question

Next by thread: Re: [tlug] Japanese regex question

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links