Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] [OT] Regular Expressions to find Japanese Text
- Date: Sun, 6 Aug 2006 16:46:24 +0200
- From: Botond Botyanszki <tlug@example.com>
- Subject: Re: [tlug] [OT] Regular Expressions to find Japanese Text
- References: <44D5FB0A.6090605@example.com>
On Sun, 06 Aug 2006 23:22:02 +0900 Dave M G <martin@example.com> wrote: > If I can figure out how to extract the first variable, $word, then I can > figure out the rest and go on to build more complicated text parsing. Reading all the characters up to the first space would be sufficient. > But it seems like it would be a lot more sophisticated if I could > determine if a word was Japanese by testing it's Unicode value or some > similar method. That way I would be less vulnerable to slight > variabilities in positioning of words in the source material. That's not very likely to happen. You should also consider the fact that there are edict dictionary files in other languages also, not just Japanese-English. > Looking at all the multibyte related functions in the PHP manual, it > seems there are options for testing the type of encoding, but not for > the type of language or character set. If you want to extract Japanese, you can convert the utf8 to utf32 (with the function on the page you posted) and then test each character if they fall into code ranges of unicode characters used in Japanese. I have some C code if you want (can be converted into php fairly easily).Attachment: signature.asc
Description: PGP signature
- References:
- [tlug] [OT] Regular Expressions to find Japanese Text
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: [tlug] [OT] Regular Expressions to find Japanese Text
- Next by Date: Re: [tlug] [OT] Regular Expressions to find Japanese Text
- Previous by thread: [tlug] [OT] Regular Expressions to find Japanese Text
- Next by thread: Re: [tlug] [OT] Regular Expressions to find Japanese Text
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links