Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SJIS & HTML - potential trouble?



On Nov 20, 12:06pm, Stephen J. Turnbull wrote:
} Subject: Re: SJIS & HTML - potential trouble?
>> 
>>     Jim> And how. I got enthusiastic some years ago, and wrote a
>>     Jim> state-driven detecter which could reliably tell SJIS, EUC and
>>     Jim> UTF-8 apart. "Normal" techniques fail because there is so
>>     Jim> much overlap, so I did it by elimination. I can't imagine
>>     Jim> trying it in lex.
>> 
>> Is this available publically?

Not released, but I'll mail copies to anyone whao asks.

Something was scratching the back of my mind earlier about SJIS, and I
just remembered it....

The comment about no 2nd-byte values for SJIS colliding with HTML 
special characters. Watch out. The "JIS X 0213" proposal appears to
be a hack aimed at filling up the "unused" space in the SJIS set with
extra kanji. In effect it is going to legitimize some of the non-standard
SJIS extensions around. I *HOPE* that they draw them from JIS212, and that
there will be consistent mappings, but it is quite possible that they
might include popular allomorphs of JIS208 kanji as well 8-(}

Jim
-----------------------------------------------------------------
a word from the sponsor will appear below
-----------------------------------------------------------------
The TLUG mailing list is proudly sponsored by TWICS - Japan's First
Public-Access Internet System.  Now offering 20,000 yen/year flat
rate Internet access with no time charges.  Full line of corporate
Internet and intranet products are available.   info@example.com
Tel: 03-3351-5977   Fax: 03-3353-6096


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links