
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Character encoding stuff
> (1) In particular, when scraping jigsaw puzzle manufacturer websites, I
> want to know what characters I'm looking at. ...
I'll mention this as useful for character encoding work, but I don't
know if it helps for what you are doing:
http://php.net/manual/en/book.intl.php
This is a heavy-duty set of functions, the ICU library, developed by IBM
originally (IIRC). It is built-in to php 5.4.x, can be added as a pecl
module for earlier versions.
> But it would be nice to get more than just numbers: stuff like
> "Cyrillic", "Punctuation" etc.
Is this a tool to use interactively? To satisfy your curiosity?
Or you want to normalize/simplify/transliterate, to make your pattern
matching simpler?
Darren
--
Darren Cook, Software Researcher/Developer
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
Home |
Main Index |
Thread Index