Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] using eucjp on Linux



On Tue, Dec 24, 2013 at 01:22:08PM +0900, Kalin KOZHUHAROV wrote:
> On Tue, Dec 24, 2013 at 6:33 AM, Christian Horn <chorn@example.com> wrote:
> > I try to get a better understanding on encodings and am puzzled
> > about the following.
> >
> > In a utf8 xterm date and outputting utf8 files works fine, also date:
> >
> >         [chris@hive ~]$ echo $LC_ALL
> >         ja_JP.utf8
> >         [chris@hive ~]$ cat test_utf8
> >         日本語
> >         [chris@hive ~]$ date
> >         2013年 12月 23日 月曜日 22:20:21 CET
> >
> so, works as expected.
> 
> Can you try the following in this terminal:
> `LC_ALL=ja_JP.eucjp date|iconv -f eucjp`
> `LC_ALL=ja_JP.eucjp date|xxd`

[chris@hive ~]$ LC_ALL=ja_JP.eucjp date|iconv -f eucjp
2013年 12月 24日 火曜日 08:15:31 CET
[chris@hive ~]$ LC_ALL=ja_JP.eucjp date|xxd
0000000: 3230 3133 c7af 2031 32b7 ee20 3234 c6fc  2013.. 12.. 24..
0000010: 20b2 d0cd cbc6 fc20 3038 3a31 353a 3438   ...... 08:15:48
0000020: 2043 4554 0a                              CET.

The xxd output is the same as for "LC_ALL=ja_JP.utf8 date|xxd"
so seems like the LC_ALL=ja_JP.eucjp has no effect?


> > I converted the file to eucjp,
> >
> to make sure just run `cat test_utf8| iconv -f utf8 -t eucjp` instead
> of converting off-side.

[chris@hive ~]$ cat test_utf8| iconv -f utf8 -t eucjp
F|K\8l


> > I think I have the locale,
> >
> Can you confirm by running `locale -a |grep -i euc` ?

[chris@hive ~]$ locale -a |grep -i euc
ja_JP.eucjp
japanese.euc
[...]


> Also what does `locale` show "after the switch" ?

[chris@hive ~]$ locale
LANG=ja_JP.utf8
LC_CTYPE="ja_JP.eucjp"
LC_NUMERIC="ja_JP.eucjp"
LC_TIME="ja_JP.eucjp"
LC_COLLATE="ja_JP.eucjp"
LC_MONETARY="ja_JP.eucjp"
LC_MESSAGES="ja_JP.eucjp"
LC_PAPER="ja_JP.eucjp"
LC_NAME="ja_JP.eucjp"
LC_ADDRESS="ja_JP.eucjp"
LC_TELEPHONE="ja_JP.eucjp"
LC_MEASUREMENT="ja_JP.eucjp"
LC_IDENTIFICATION="ja_JP.eucjp"
LC_ALL=ja_JP.eucjp


> > but I
> > fail to get the eucjp encoded file displayed.  Also the date out-out is not correct:
> >
> >         [chris@hive ~]$ LC_ALL=ja_JP.eucjp luit
> >         [chris@hive ~]$ locale charmap
> >         EUC-JP
> >         [chris@hive ~]$ cat test_eucjp
> >         F|K\8l
> >         [chris@hive ~]$ date
> >         2013G/ 127n 23F| 7nMKF| 22:21:45 CET
> >         [chris@hive ~]$ cat test_utf8
> >         f%f,h*
> >
> > Running these commands in a terminal "xterm -en eucjp".
> >
> > I think I am missing something.. any ideas?
> >
> quite a few things may be going on... try the above commands and let's see.
> Also for xterm, show the output of `xrdb -q|grep -i XTerm`

No output:
[chris@hive ~]$ xrdb -q|grep -i XTerm
[chris@hive ~]$ 


> These days I use mostly x11-terms/xfce4-terminal, but I just tried the
> following and it works fine:
> 
> $ LC_ALL=ja_JP.eucjp xterm
> $ date <-- in the new terminal
> $ date
> 2013年 12月 24日 火曜日 13:09:46 JST
> $ date |xxd
> 0000000: 3230 3133 c7af 2031 32b7 ee20 3234 c6fc  2013.. 12.. 24..
> 0000010: 20b2 d0cd cbc6 fc20 3133 3a30 393a 3437   ...... 13:09:47
> 0000020: 204a 5354 0a                              JST.
> 
> which is EUC-JP.

The "date" output here is not as expected, but "xxd" seems to show
that "date" is properly presented to the terminal/luit in eucjp.


> Finally after some tinkering, here is a unit test for you to check
> that most things around locales are fine (start from UTF8 locale):
> 
> for l in utf8 eucjp sjis; do echo -e "$l\t $(LC_ALL=ja_JP.utf8 date
> +%A)"; LC_ALL=ja_JP.$l date +%A|xxd; echo; done
> utf8     火曜日
> 0000000: e781 abe6 9b9c e697 a50a                 ..........
> 
> eucjp     火曜日
> 0000000: b2d0 cdcb c6fc 0a                        .......
> 
> sjis     火曜日
> 0000000: 89ce 976a 93fa 0a                        ...j...
> 
> If you run it today (Tuesday), you may check this as well:
> $ for l in utf8 eucjp sjis; do echo -e "$l\t $(LC_ALL=ja_JP.utf8 date
> +%A)"; LC_ALL=ja_JP.$l date +%A|xxd; echo; done|md5sum -c <(echo
> "bc2ec6dc8e941801ee37286c8b28c277  -")
> -: OK
> (it should print OK, be careful with spaces).

The shift-jis test fails, this fedora here is missing that locale.
The existing ja_JP.ujis locale turns out to be an "eucjp" alias.
LC_ALL=ja_JP.sjis falls back to "LC_ALL=C".

[chris@hive ~]$ for l in utf8 eucjp sjis ujis; do echo -e "$l\t $(LC_ALL=ja_JP.utf8 date +%A)"; LC_ALL=ja_JP.$l date +%A|xxd; echo; done
utf8     火曜日
0000000: e781 abe6 9b9c e697 a50a                 ..........

eucjp    火曜日
0000000: b2d0 cdcb c6fc 0a                        .......

sjis     火曜日
0000000: 5475 6573 6461 790a                      Tuesday.

ujis     火曜日
0000000: b2d0 cdcb c6fc 0a                        .......


> BTW, last time I checked (2-5 years ago) luit is not needed explicitly.

me-- for not doublereading the mails..
Seems like the issue is more in the terminal area?
So far only tried uxterm and gnome-terminal as alternatives,
both have the same results for the above commands.

Any ideas welcome..

Christian


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links