Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SJIS & HTML - potential trouble?



On Nov 19,  3:41am, Ken Schwarz wrote:
} Subject: SJIS & HTML - potential trouble?

>> Sorry, this question is also not really related to Linux, but a topic
>> nearly as dear to our hearts: SJIS & HTML. :0)

Why on earth would SJIS be dear to anyone's heart??

>> Has anyone seen anything written about problems of SJIS text confusing
>> HTML parsers?  I haven't had time to think it through, but it seems
>> likely that second bytes of SJIS could confuse naive HTML parsers.

Well I can quite imagine some Americo-centric programmer stumbling on
codes > 128. OTOH, do they really write parsers that could not handle the
ISO-8859-1 codes wich are very widely used in Europe? These are the single
byte (128-255) codes for the letters with diacritic marks, etc. They
usually appear singly, and *cannot* be mixed with either SJIS or EUC.

>> I'd expect that EUC is less troublesome, but worth checking as well.

If they have made the assumption that the only codes using the high-orde
bit are those of EUC-J, in which both bytes have that bit set, they'll be
safe. (At least until they encounter some EUC-J coded JIS212, which uses 3
bytes.)

Seriously, though, people writing parsers, etc, should be producing code
which is:
(a) configurable for a series of muti-byte codes with the MSB set and not
set
(b) able to handle the UTF codings of Unicode/ISO10646

Jim Breen
---------
jwb@example.com

-----------------------------------------------------------------
a word from the sponsor will appear below
-----------------------------------------------------------------
The TLUG mailing list is proudly sponsored by TWICS - Japan's First
Public-Access Internet System.  Now offering 20,000 yen/year flat
rate Internet access with no time charges.  Full line of corporate
Internet and intranet products are available.   info@example.com
Tel: 03-3351-5977   Fax: 03-3353-6096


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links