Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Japanese regex question



On Wed, 24 Aug 2005, Jonathan Byrne wrote:

> I'm baaaaaaack! :)
>
> Well, I'm still on admin but haven't had time to read the main list in
> too long.
>
> And of course, it was having an odd problem that dragged me back :)
>
> I have a basic regular expression targetted at raw iso-2022-jp text.
> The word I'm targetting is ナンパ and the regex is (?:
> \=25\=4A\=25\=73\=25\=51|\x25\x4A\x25\x73\x25\x51).
>
> The trouble is, it also seems to be matching リンパ and I can't figure out
> why, because where ナンパ is 25 4A 25 73 25 51, リンパ is 25 6A 25 73 25 51.
>
> The regexes used by this application are perl, but the app itself is
> written in C and uses the PCRE library, in case anyone knows of any
> relevance there.

Just a guess -- have you given the 'i' flag (case insensitivity) somehow?

Because 0x4a and 0x6a are uppercase and lowercase J in ascii and will 
match the same with /i in a regexp.
-- 
Tod

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links