Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Do you whitelist or blacklist utf-8?



> ^[\p{L}\p{N}\p{Z}]$
> 
> This is assuming that PHP's regex engine can handle the POSIX attributes.

PHP uses PCRE under the covers, and does support them:
 http://jp.php.net/manual/en/regexp.reference.unicode.php

You have to add the /u modifier at the end of your regex.

Also, note the last comment (from Mar 2010, so quite recent):
 ...only available if PCRE is compiled with "--enable-unicode-properties"

I've no idea how widespread an issue that is; at least on Ubuntu 10,
Centos 5 and Xampp (a windows WAMP installation) it works fine.

Here is my simple test script, which outputs "100" (i.e. true, false,
false):
  echo preg_match('/^\p{L}+$/u', '日本語');
  echo preg_match('/^\p{L}+$/u', '<b>日本語</b>');
  echo preg_match('/^\p{Sm}+$/u', '日本語');

Darren

P.S. Josh, regarding your other comment:

>> Surely if any language had MySQL syntax constructors it would be PHP....

> Surely you were being sarcastic... ;-P

Not at all: MySQL and PHP go together like fish 'n' chips (c.f. LAMP),
so Dave's point was that if PHP didn't support something useful in MySQL
one of the millions of users would have added it very quickly.
In comparison, practically no-one uses the C API ;-)
(He says, lighting the fuse on the Language Wars bomb, then standing
well back...)


-- 
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links