Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Character encoding stuff



I write bits of my website in PHP, and am always bumping up against character set issues. Here are (is??) a plurality of questions.

(1) In particular, when scraping jigsaw puzzle manufacturer websites, I want to know what characters I'm looking at. Things like "Is that cross a *multiplication sign, *lowercase-x, *capital-X, *zenkaku-x, *zenkaku-X, or who knows what (х for example, and I managed to type that one in). I started looking on the web, then realised I actually wrote a primitive one myself: for example

http://imaginatorium.org/svc/unicode.php?ins=x%C3%97%D1%85

But it would be nice to get more than just numbers: stuff like "Cyrillic", "Punctuation" etc. Any suggestions for useful tools, either Web-based or a screen utility I can run in Linux?

(2) I user gedit, which is sort of fine, but it does Really Stupid (sorry, I mean "clever") display tricks, trying to guess how things should be shown depending on surrounding characters. So paste in the following two lines, and the two marus appear completely different (in size: both are circles):

これは、○です。(マル)
But this is exactly the same character: ○

Are there any suggestions of editors more suited to multi-script work?

There are a few other things, but I'd better go an watch детараме хиро now. (That came out wrong...)

Brian Chandler






Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links