
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] HTML entity to Unicode conversion
- Date: Fri, 11 Jul 2003 18:21:17 +0900 (JST)
- From: "J. David Beutel" <jdb@example.com>
- Subject: Re: [tlug] HTML entity to Unicode conversion
On Fri, 11 Jul 2003, David Riggs wrote:
> How can I convert my diacritics in HTML entities (for example ū for u
> with macron) into utf8 form? I finally got my Mac diacritics into a
> standard form, and now I would like to change them, and the SJIS kanji into
> Unicode.
>
> I hope there is a Perl script or maybe something even simple out there.
Your example, 363, is the Unicode (in decimal) for the u-macron. So the
conversion would just be from the HTML entity, &#nnn;, into the UTF-8
bytes for that number. Using regular expressions, that would be a simple
program in Java (or Perl?). (Sorry, I don't know of one offhand.)
It would not be so simple if the HTML entities were representing some
other encoding (e.g., ISO-8859-2), or if they were named (e.g., ©),
or if they were in various bases (e.g., ū).
11011011
Home |
Main Index |
Thread Index