Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Character Encodings Again



TLUG saves the day again!

J. David Beutel writes:

 > > 41377 94  8481
 > [...]
 > > 65185 94 32289
 > >
 > > Do these values ring a bell with anyone? (I've been told that one side 

 > You can see the pattern when you convert to hex.  41377 = A1A1,
 > and 8481 = 2121.  So, the right side is the JIS X 0208 kuten (1,1),
 > and the left side is the same thing in EUC.

Duhh! How did I manage to miss that? I did convert the numbers to hex
(or I thought I did ... maybe I was trying to convert them *from* hex
to decimal).

 > You may want the Unicode table, JIS0208.TXT, from unicode.org.

Got it, thanks.

Stephen J. Turnbull writes:

 >     Matt> In case you're wondering what all this is about: I'm trying
 >     Matt> to write an SGML declaration that will allow the use of
 >     Matt> kanji in markup (e.g., instead of <par></par>, you could
 >     Matt> have <段落></段落> ... and so on.

 > I'm not sure whether content characters and markup characters are kept
 > separate in the software; you may need to recompile nsgmls to handle
 > 16-bit character sets.

Actually, I'm a step ahead of you on that one. I already made sure to
compile it with -DSP_MULTIBYTE (actually, I think that's the default
when you compile it from the Jade distribution ... anyway, I made sure 
of it).

 > I'm not sure you're allowed to use any character sets in markup except
 > ASCII, ISO-8859-1, and Unicode (aka ISO-10646/UCS-2).

According to the original SGML standard, no. But there are Extended
Naming Rules which supposedly do allow you to use any character set,
provided that you specify it properly in the SGML declaration. Or so I 
gather from several very cursory sources. My searches of the Web and
Deja News give me the strong impression that very few people, even
among SGML wizards, really understand how it works.
 
 > Have you looked at the standard with respect to that?

Ha-ha. I really should one of these days, shouldn't I. Unfortunately
SGML is another one of those standards you have to buy from ISO
... it's 200 bucks or so. So, one of these days ...

Thanks once again, guys!

Matt Gushee
Oshamanbe, Hokkaido
----------------------------------------------------------------
Next Nomikai: 20 November, 19:30   Tengu TokyoEkiMae 03-3275-3691
Next Technical Meeting: 12 December, 12:30 HSBC Securities Office
----------------------------------------------------------------
more info: http://tlug.linux.or.jp Sponsors: PHT, HSBC Securities

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links