Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tlug: LaTeX: CJK package: 0212 support



>>>>> "Martin" == Martin Minich <minich5@example.com> writes:

    Martin> weird enough, with the ordinary JIS X 0208 characters; one
    Martin> has to start the environment like \begin{CJK}{JIS}{song}
    Martin> (i believe JIS stands for encoding, 'cos other
    Martin> posibilities are SJIS, or something else for Big5 etc, as
    Martin> i could read in the documentation), but what actually has
    Martin> to follow, is a japanese text in the __EUC__ (!!!) 
    Martin> encoding!!

Technically, according to ISO 2022, EUC uses the JIS encoding (a 7-bit
encoding), but loads it into the "GR register".  Whether the
"register" is "GL" (the default) or "GR" is indicated by whether the
high bit of the byte is set or not.  Shift JIS, although based on JIS,
actually uses byte values that could _never_ appear in a JIS or EUC
file (or any ISO-2022-conformant file), and thus is not really
JIS-encoded.  (The reason for ku-ten being 1--94 is precisely so that
JIS can be ISO-2022-conformant.)  Similarly, ISO-8859-2 can be
interpreted as not an 8-bit encoding, but two 7-bit encodings with
ASCII in GL and Latin-2 in GR.

If that gives you a headache, then just be thankful that within a
couple of years Unicode will remove the need to worry about it in
multilingual applications.

What CJK does is to redefine those EUC characters as "active", which
means that the first of a pair can invoke a macro to read the second
of the pair and generate the `\CJKchar[JIS2]{80}{38}' commands.

The reason for using EUC and not 7-bit JIS is that TeX would get very
confused if you redefined the ASCII characters, since they are used
for encoding TeX commands (eg, the reason that `\' can start a TeX
macro is because it is an active character whose definition is "read
the following letters and invoke the corresponding macro", and `$' is
an active character which invokes the command "toggle-math-mode"---if
you want to use hiragana in 7-bit JIS, you need that `$' to mean
"hiragana", not math mode)!  But TeX never uses the high-bit-set
characters internally, only as text data.  So they are safe to define
as "active" characters.

    Martin> but alas, when i try it out and compare to the tables from
    Martin> the Ken Lunde's book, it seems to me, that <byte1> = ku +
    Martin> 31, <byte2> = ten + 32 are the correct formulae; but does
    Martin> it give any sense?

This doesn't make sense to me and is probably a bug in the definition
of the JIS2 encoding for CJK.

But if it works, use it that way (but remember that if it's a bug it
could get fixed at any time).  I would write to the CJK people if I
were you.


-- 
University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
_________________  _________________  _________________  _________________
What are those straight lines for?  "XEmacs rules."
--------------------------------------------------------------------
Next Nomikai Meeting: February 18 (Fri) 19:00 Tengu TokyoEkiMae
Next Technical Meeting:  March 11 (Sat) 13:00 Temple University Japan
* Topic: TBD
--------------------------------------------------------------------
more info: http://www.tlug.gr.jp        Sponsor: Global Online Japan


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links