Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: Learn a Variety of Languages . . . . . . .




On 1/17/07, Darren Cook <darren@example.com> wrote:
> really ? - so how do you *easily* get a character count for a string of
> utf-8 bytes ?

In PHP it's mb_strlen()  (with internal_encoding set to "UTF-8" of
course). You can still use strlen() to get number of bytes. I use PHP
for most of my Japanese text analysis, only switching to C++ when memory
or speed becomes enough of an issue to justify it.

Thats my point - you need to use a whole set of seperate functions for string handling and regexs. PHP6 will ( apparently) have builtin unicode/utf-8 support so that strlen will return the number of chars.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links