Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SJIS & HTML - potential trouble?



On Nov 20,  9:37am, Jim Breen wrote:
} Subject: Re: SJIS & HTML - potential trouble?

Earlier this morning I wrote:

>> Well I can quite imagine some Americo-centric programmer stumbling on
>> codes > 128. OTOH, do they really write parsers that could not handle the
>> ISO-8859-1 codes wich are very widely used in Europe? These are the single
>> byte (128-255) codes for the letters with diacritic marks, etc. They
>> usually appear singly, and *cannot* be mixed with either SJIS or EUC.

Thinking about it some more, I can see that badly-handled SJIS is more of
a problem than Latin-1/2/nado, in that the second byte of a SJIS sequence
can be a `<', which could upset a parser badly if it wasn't tuned to
2-byte codes.

"Proper" handling of SJIS (an oxymoron if there ever was one) involves a
lot of checking for valid/invalid sequences, as you have to cater for the
unspeakable hankaku katakana as well. Trying to scan backwards, e.g. in a
WP program, through some raw SJIS sends you grey. Usually developers do
something like holding everything as 16-bit codes internally.

(No excuse for bad parsing, though.)

Jim
-----------------------------------------------------------------
a word from the sponsor will appear below
-----------------------------------------------------------------
The TLUG mailing list is proudly sponsored by TWICS - Japan's First
Public-Access Internet System.  Now offering 20,000 yen/year flat
rate Internet access with no time charges.  Full line of corporate
Internet and intranet products are available.   info@example.com
Tel: 03-3351-5977   Fax: 03-3353-6096


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links