Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: SJIS & HTML - potential trouble?
- To: tlug@example.com
- Subject: Re: SJIS & HTML - potential trouble?
- From: jwb@example.com (Jim Breen)
- Date: Wed, 20 Nov 1996 09:37:48 -0500
- In-Reply-To: Ken Schwarz <kls@example.com> "SJIS & HTML - potential trouble?" (Nov 19, 3:41am)
- Reply-To: tlug@example.com
- Sender: owner-tlug
On Nov 19, 3:41am, Ken Schwarz wrote: } Subject: SJIS & HTML - potential trouble? >> Sorry, this question is also not really related to Linux, but a topic >> nearly as dear to our hearts: SJIS & HTML. :0) Why on earth would SJIS be dear to anyone's heart?? >> Has anyone seen anything written about problems of SJIS text confusing >> HTML parsers? I haven't had time to think it through, but it seems >> likely that second bytes of SJIS could confuse naive HTML parsers. Well I can quite imagine some Americo-centric programmer stumbling on codes > 128. OTOH, do they really write parsers that could not handle the ISO-8859-1 codes wich are very widely used in Europe? These are the single byte (128-255) codes for the letters with diacritic marks, etc. They usually appear singly, and *cannot* be mixed with either SJIS or EUC. >> I'd expect that EUC is less troublesome, but worth checking as well. If they have made the assumption that the only codes using the high-orde bit are those of EUC-J, in which both bytes have that bit set, they'll be safe. (At least until they encounter some EUC-J coded JIS212, which uses 3 bytes.) Seriously, though, people writing parsers, etc, should be producing code which is: (a) configurable for a series of muti-byte codes with the MSB set and not set (b) able to handle the UTF codings of Unicode/ISO10646 Jim Breen --------- jwb@example.com ----------------------------------------------------------------- a word from the sponsor will appear below ----------------------------------------------------------------- The TLUG mailing list is proudly sponsored by TWICS - Japan's First Public-Access Internet System. Now offering 20,000 yen/year flat rate Internet access with no time charges. Full line of corporate Internet and intranet products are available. info@example.com Tel: 03-3351-5977 Fax: 03-3353-6096
Home | Main Index | Thread Index
- Prev by Date: Re: EDICT and gairaigo or colloquial nihongo
- Next by Date: Re: SJIS & HTML - potential trouble?
- Prev by thread: SJIS & HTML - potential trouble?
- Next by thread: Re: SJIS & HTML - potential trouble?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links