Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] HTML entity to Unicode conversion
- Date: Fri, 11 Jul 2003 18:21:17 +0900 (JST)
- From: "J. David Beutel" <jdb@example.com>
- Subject: Re: [tlug] HTML entity to Unicode conversion
On Fri, 11 Jul 2003, David Riggs wrote: > How can I convert my diacritics in HTML entities (for example ū for u > with macron) into utf8 form? I finally got my Mac diacritics into a > standard form, and now I would like to change them, and the SJIS kanji into > Unicode. > > I hope there is a Perl script or maybe something even simple out there. Your example, 363, is the Unicode (in decimal) for the u-macron. So the conversion would just be from the HTML entity, &#nnn;, into the UTF-8 bytes for that number. Using regular expressions, that would be a simple program in Java (or Perl?). (Sorry, I don't know of one offhand.) It would not be so simple if the HTML entities were representing some other encoding (e.g., ISO-8859-2), or if they were named (e.g., ©), or if they were in various bases (e.g., ū). 11011011
- References:
- [tlug] HTML entity to Unicode conversion
- From: David Riggs
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] mount: /dev/cdrom is not a valid block device
- Next by Date: Re: [tlug] mount: /dev/cdrom is not a valid block device
- Previous by thread: [tlug] HTML entity to Unicode conversion
- Next by thread: [tlug] keitai encodings
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links