Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Japanese regex question




On Wed, 24 Aug 2005 22:27:28 -0400
Josh Glover <jmglov@example.com> wrote:

> On 8/24/05, Brett Robson <b-robson@example.com> wrote:
> 
> > I don't know how much experience you have with J-regex but the biggest
> > issue is anchoring. Because it's double byte you can't be sure you're
> > matching from the first byte of a character.
> 
> I am fairly certain that with Unicode, Perl 5.8 regular expressions
> handle the multi-byte encoding properly and do not treat strings as
> arrays of bytes.
> 

But he said he was using raw 2022 encoding and those numbers look
correct for katakana in 2022. I was thinking though that he would
probably be better off converting to unicode internally for that exact
reason.

Brett




Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links