TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Generating Furigana in documents

Date: Sat, 30 Mar 2013 18:35:43 +0900

From: Curt Sampson <cjs@example.com>

Subject: Re: [tlug] Generating Furigana in documents

References: <20130329122527.GA30508@fluxcoil.net> <20130330072550.GA6687@skeptic.cynic.net> <20130330090122.GA32390@fluxcoil.net>

User-agent: Mutt/1.5.21 (2010-09-15)
On 2013-03-30 10:01 +0100 (Sat), Christian Horn wrote:

> Hm.. it seems to expect 'JIS x0208' Kanji characters that I am
> unable to produce.

Actually, JIS x0208 is a character set; you also need to worry about
the encoding of that character set. (A character set is merely a list
of characters, so 僕 might for example be number 1728. How that number
1728 is encoded in a stream of bytes can vary. Note that Unicode is a
character set and UTF-8, UCS-16, et al. are encodings of Unicode.)

> 		./kakasi -i jis -H; 

'jis' is the term they use for ISO-2022-JP encoding. I don't really
recommend using it; it's not that common outside of e-mail messages and
it's not very compact. Ideally you want to be using UTF-8 everywhere, of
course, but kakasi doesn't appear to support Unicode, unfortunately.
Typically I go with Shift-JIS when I can't use Unicode, and this seemed
to work well for me when I tried it just now.

Using vim, I created a file with your sample text in Unicode
(":set encoding=utf-8"), and this translated fine for me:

    $ cat input-file 
    私は馬鹿です
    $ iconv -f utf8 -t sjis input-file | kakasi -JK | iconv -f sjis -t utf8
    ワタシはバカです

I used a file for input just to make sure I was certain what the input
encoding was, but it also works fine just typing it directly into a
UTF-8 xterm:

    $ echo '私は馬鹿です' \
        | iconv -f utf8 -t sjis | kakasi -JK | iconv -f sjis -t utf8
    ワタシはバカです

Incidently, nkf (Network Kanji Filter, also available as an Ubuntu
package) can be useful as the final filter when working out encoding
issues because it's usually fairly good at guessing the input encoding
and translating it to whatever output encoding you know you need.

cjs
-- 
Curt Sampson         <cjs@example.com>         +81 90 7737 2974

To iterate is human, to recurse divine.
    - L Peter Deutsch
Follow-Ups:

Re: [tlug] Generating Furigana in documents
From: Christian Horn

References:

[tlug] Generating Furigana in documents
From: Christian Horn

Re: [tlug] Generating Furigana in documents
From: Curt Sampson

Re: [tlug] Generating Furigana in documents
From: Christian Horn

Prev by Date: Re: [tlug] Generating Furigana in documents

Next by Date: Re: [tlug] Generating Furigana in documents

Previous by thread: Re: [tlug] Generating Furigana in documents

Next by thread: Re: [tlug] Generating Furigana in documents

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links