Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Character encoding stuff



Darren Cook writes:

 > > (1) In particular, when scraping jigsaw puzzle manufacturer websites, I 
 > > want to know what characters I'm looking at. ...
 > 
 > I'll mention this as useful for character encoding work, but I don't
 > know if it helps for what you are doing:
 > 
 >   http://php.net/manual/en/book.intl.php

ICU should have functions to look up characters by name and name by
character.  Unfortunately for us East Asians, the Unicode folks
decided not to give real names to kanji, but instead call them "East
Asian Ideograph 4E00" or something like that.  Still that gives the OP
what he asked for.

 > This is a heavy-duty set of functions, the ICU library, developed by IBM
 > originally (IIRC).

That's correct.  Don't say Big Blue never did anything for you!




Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links