Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Generating Furigana in documents
- Date: Sat, 30 Mar 2013 18:35:43 +0900
- From: Curt Sampson <cjs@example.com>
- Subject: Re: [tlug] Generating Furigana in documents
- References: <20130329122527.GA30508@fluxcoil.net> <20130330072550.GA6687@skeptic.cynic.net> <20130330090122.GA32390@fluxcoil.net>
- User-agent: Mutt/1.5.21 (2010-09-15)
On 2013-03-30 10:01 +0100 (Sat), Christian Horn wrote: > Hm.. it seems to expect 'JIS x0208' Kanji characters that I am > unable to produce. Actually, JIS x0208 is a character set; you also need to worry about the encoding of that character set. (A character set is merely a list of characters, so 僕 might for example be number 1728. How that number 1728 is encoded in a stream of bytes can vary. Note that Unicode is a character set and UTF-8, UCS-16, et al. are encodings of Unicode.) > ./kakasi -i jis -H; 'jis' is the term they use for ISO-2022-JP encoding. I don't really recommend using it; it's not that common outside of e-mail messages and it's not very compact. Ideally you want to be using UTF-8 everywhere, of course, but kakasi doesn't appear to support Unicode, unfortunately. Typically I go with Shift-JIS when I can't use Unicode, and this seemed to work well for me when I tried it just now. Using vim, I created a file with your sample text in Unicode (":set encoding=utf-8"), and this translated fine for me: $ cat input-file 私は馬鹿です $ iconv -f utf8 -t sjis input-file | kakasi -JK | iconv -f sjis -t utf8 ワタシはバカです I used a file for input just to make sure I was certain what the input encoding was, but it also works fine just typing it directly into a UTF-8 xterm: $ echo '私は馬鹿です' \ | iconv -f utf8 -t sjis | kakasi -JK | iconv -f sjis -t utf8 ワタシはバカです Incidently, nkf (Network Kanji Filter, also available as an Ubuntu package) can be useful as the final filter when working out encoding issues because it's usually fairly good at guessing the input encoding and translating it to whatever output encoding you know you need. cjs -- Curt Sampson <cjs@example.com> +81 90 7737 2974 To iterate is human, to recurse divine. - L Peter Deutsch
- Follow-Ups:
- Re: [tlug] Generating Furigana in documents
- From: Christian Horn
- References:
- [tlug] Generating Furigana in documents
- From: Christian Horn
- Re: [tlug] Generating Furigana in documents
- From: Curt Sampson
- Re: [tlug] Generating Furigana in documents
- From: Christian Horn
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Generating Furigana in documents
- Next by Date: Re: [tlug] Generating Furigana in documents
- Previous by thread: Re: [tlug] Generating Furigana in documents
- Next by thread: Re: [tlug] Generating Furigana in documents
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links