Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Character encoding stuff



> (1) In particular, when scraping jigsaw puzzle manufacturer websites, I 
> want to know what characters I'm looking at. ...

I'll mention this as useful for character encoding work, but I don't
know if it helps for what you are doing:

  http://php.net/manual/en/book.intl.php

This is a heavy-duty set of functions, the ICU library, developed by IBM
originally (IIRC). It is built-in to php 5.4.x, can be added as a pecl
module for earlier versions.

> But it would be nice to get more than just numbers: stuff like 
> "Cyrillic", "Punctuation" etc. 

Is this a tool to use interactively? To satisfy your curiosity?
Or you want to normalize/simplify/transliterate, to make your pattern
matching simpler?

Darren


-- 
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links