Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Re: UTF-8 under Linux



"Stephen J. Turnbull" <stephen@example.com> writes:

l>>>>>> "Mike" == Mike Fabian <mfabian@example.com> writes:
>
>     Mike> GNU Emacs 21.1.1 with Mule-UCS-0.84 can edit and display all
>     Mike> of them.
>
>     Mike> XEmacs 21.4.8 with Mule-UCS-0.84 can edit and display all of
>     Mike> them *except* U+2BB.
>
> That's right, it doesn't exist in our tables (mule-ucs/lisp/reldata).
> Doesn't exist anywhere except in Unicode, far as I can tell, none of
> the mapping tables I have (offhand, I haven'T gone dredging the web)
> have it.
>
> Where do you get the Mule-UCS?

For GNU Emacs I build a SuSE package from what I get here:

   ftp://ftp.m17n.org/pub/mule/Mule-UCS/Mule-UCS-0.84.tar.gz

> Is it the same, or are you use XEmacs package for one and the
> upstream for the other?

Since XEmacs has included Mule-UCS in the packages now, I have dropped
the extra mule-ucs-xemacs.rpm I used to build for SuSE
Linux. Currently I use what I get when I checkout the XEmacs packages
from CVS using the following tag

    cvs update -r xemacs-sumo-2002-05-22

Just for testing I have removed
/usr/share/xemacs/mule-packages/lisp/mule-ucs/ again and reinstalled
my old mule-ucs-xemacs.rpm build from the above upstream sources of
Mule-UCS-0.84.tar.gz. The result is the same, U+2BB doesn't work.

> It's quite possible that either GNU or
> Miyashita-san has updated the character tables.

I can't find U+2BB in Miyashita-san's upstream sources of Mule-UCS either:

    mfabian@example.com:~/suse-packages/Mule-UCS/Mule-UCS-0.84$ grep -i -r 0x.*2bb *
    lisp/reldata/uethiopic.el:       (?"~ . "0x12BB") ;; ETHIOPIC SYLLABLE KXAA
    lisp/reldata/ubig5.el:       (?.5 . "0x52BB") ;; <CJK>
    lisp/reldata/ubig5.el:       (?9q . "0x82BB") ;; <CJK>
    lisp/reldata/ubig5.el:       (?QY . "0x92BB") ;; <CJK>
    lisp/reldata/ubig5.el:       (?&} . "0x62BB") ;; <CJK>
    lisp/reldata/ujisx0208.el:       (?Y; . "0x62BB")  ;; <CJK>
    lisp/reldata/ujisx0208.el:       (?gm . "0x82BB")  ;; <CJK>
    lisp/reldata/ujisx0212.el:       (?3c . "0x52BB") ;; <CJK>
    lisp/reldata/ujisx0212.el:       (?d. . "0x92BB") ;; <CJK>
    lisp/reldata/uksc5601.el:       (?uV . "0x82BB") ; <CJK>
    lisp/reldata/u-cns-1.el:       (?L4 . "0x52BB") ; <CJK>
    lisp/reldata/u-cns-1.el:       (?Wp . "0x82BB") ; <CJK>
    lisp/reldata/u-cns-1.el:       (?oY . "0x92BB") ; <CJK>
    lisp/reldata/u-cns-2.el:       (?&| . "0x62BB") ; <CJK>
    lisp/reldata/u-cns-3.el:       (?&f . "0x72BB") ; <CJK>
    lisp/reldata/ugb2312.el:       (?^S . "0x62BB") ;; <CJK>
    mfabian@example.com:~/suse-packages/Mule-UCS/Mule-UCS-0.84$

The Mule-UCS from the XEmacs packages CVS has the same version number
as Miyashita-san's latest upstream version:

    mfabian@example.com:~/suse-packages/Mule-UCS/Mule-UCS-0.84/lisp$ lgrep -Ij -Ou8 "defconst.*mucs-version" *
    mucs.el:(defconst mucs-version "0.84 (KOUGETSUDAI:向月台)")
    mfabian@example.com:~/suse-packages/Mule-UCS/Mule-UCS-0.84/lisp$

    mfabian@example.com:/usr/share/xemacs/mule-packages/lisp/mule-ucs$ lgrep -Ij -Ou8 "defconst.*mucs-version" *
    mucs.el:(defconst mucs-version "0.84 (KOUGETSUDAI:向月台)")
    mfabian@example.com:/usr/share/xemacs/mule-packages/lisp/mule-ucs$

Nevertheless there are a few differences in the reldata files:

    mfabian@example.com:~/suse-packages/Mule-UCS/Mule-UCS-0.84/lisp/reldata$ diff -q /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/ .
    Files /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/u-cns-1.el and ./u-cns-1.el differ
    Files /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/u-cns-2.el and ./u-cns-2.el differ
    Files /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/u-cns-3.el and ./u-cns-3.el differ
    Files /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/u-cns-4.el and ./u-cns-4.el differ
    Files /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/u-cns-5.el and ./u-cns-5.el differ
    Files /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/u-cns-6.el and ./u-cns-6.el differ
    Files /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/u-cns-7.el and ./u-cns-7.el differ
    mfabian@example.com:~/suse-packages/Mule-UCS/Mule-UCS-0.84/lisp/reldata$

but only in CNS related files, therefore I think they are unrelated to U+2BB.

> What does C-u C-x = tell you about that character under GNU Emacs?

  character: ? (01211133, 332379, 0x5125b)
    charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
 code point: 36 91
     syntax: word
   category: u:Mule unicode characters  
buffer code: 0x9C 0xF4 0xA4 0xDB
  file code: 0xCA 0xBB (encoded by coding system utf-8)
       font: -Misc-Fixed-Medium-R-Normal--15-140-75-75-C-90-ISO10646-1

(Question mark behind 'character:' in my Mail because XEmacs can't
handle this character, but it displays fine in Emacs)

> Also, what font are you using to display?  Does GNU Emacs support
> unicode fonts, or do they do the same hack we do (translating to the
> usual set of X registries)?

They do the same as XEmacs.

This may look ugly if you edit Chinese in UTF-8 because some Chinese
characters are taken from a Japanese font and some from a Chinese
font, for example:

    http://www.suse.de/~mfabian/screenshots/emacs-friedrich-dimmling-fontset-16.png

A customer had this problem and I tried to make this less ugly for him
by defining additional fontsets which use mainly Unicode fonts in
/usr/share/emacs/site-lisp/suse-start-Mule-UCS.el, which I attach to
this mail.

suse-start-Mule-UCS.el is automatically loaded on SuSE Linux when GNU
Emacs is started and Mule-UCS.rpm is installed. Using one of these
"Unicode fontsets", Chinese text in UTF-8 looks nicer because a
uniform style is used, for example:

    http://www.suse.de/~mfabian/screenshots/emacs-friedrich-dimmling-fontset-16_efont_unicode.png

suse-start-Mule-UCS.el uses some functions which exist only in Emacs
and therefore it doesn't work like that for XEmacs. Probably something
similar can be done for XEmacs and it might be useful for some people
but I didn't yet have time to investigate that.

Anyway, I can also display U+2BB using the "Unicode fontsets" I defined in
suse-start-Mule-UCS.el.  'C-u C-x =' then tells me for example:

      character: ? (01211133, 332379, 0x5125b)
        charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
     code point: 36 91
         syntax: word
       category: u:Mule unicode characters  
    buffer code: 0x9C 0xF4 0xA4 0xDB
      file code: 0xCA 0xBB (encoded by coding system utf-8)
           font: -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1

or

      character: ? (01211133, 332379, 0x5125b)
        charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
     code point: 36 91
         syntax: word
       category: u:Mule unicode characters  
    buffer code: 0x9C 0xF4 0xA4 0xDB
      file code: 0xCA 0xBB (encoded by coding system utf-8)
           font: -gnu-unifont-medium-r-normal--16-160-75-75-c-80-iso10646-1

or something similar for the other "Unicode fontsets".

> Either an URL for the source of the Mule-UCS you use for GNU Emacs or
> a diff between that tree and the XEmacs package sources would be どう
> もありがたい (like, literally "hard to get" ;-).

A complete diff between the upstream Mule-UCS source which I use for
GNU Emacs and the one from the XEmacs packages sources is here (as
soon as our web server syncs):

    http://www.suse.de/~mfabian/misc/xemacs-packages-mule-ucs-upstream.diff.bz2

Because the problem in XEmacs remains when I use Mule-UCS from the
unchanged upstream source, I guess the problem is somewhere else
though.

Attachment: suse-start-Mule-UCS.el
Description: application/emacs-lisp

-- 
Mike Fabian   <mfabian@example.com>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links