Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]RE: [tlug] Translating old to new kanji forms using tr
- Date: Wed, 29 Jun 2005 11:49:17 +0900
- From: "Danny Wilde" <fuzakenbo@example.com>
- Subject: RE: [tlug] Translating old to new kanji forms using tr
>From: David Riggs <dariggs@example.com> >Reply-To: tlug@example.com >To: tlug@example.com >Subject: [tlug] Translating old to new kanji forms using tr >Date: Tue, 28 Jun 2005 22:28:29 +0900 > >I need to go back and forth between the old (旧孁E and the modern kanji >forms. I have a list of corresponding old and new form and the the stardard >utility "tr" works fine for a test case: > >echo 仁E| tr 仁E佁E> >Gives back 佁Ejust fine. > >But as soon as I get more than a handful of characters in the two >translation pair lists I get random answers that make no sense. I am >setting up a script that simply feeds "tr" the two long lists that I have >stuffed into two variables. But I have tried testing "tr" outside of the >script and get the same weird results. I am running very standard Debian >Sarge 3.1, starting up X with > >export XMODIFIERS="@example.com=kinput2" LC_CTYPE=ja_JP.UTF-8 Using EUC coding, it's possible to get a series of kanji which is something like A2 A4 B3 A4 (kanji 1, kanji 2) Now, in your list tr might have a kanji A2 A4, and then one B3 A4, but also one A4 B3. Thus if it doesn't understand where one character begins and another ends, it might mistakenly match the A4 B3 as another kanji and thus foul up. >Which seems to make all the usual utilities work just fine with kanji >inside the "konsole" (or plain old xterm as far as that goes). > > >Anybody tried to do this kind of stuff with "tr"? Or have another solution? I've done something like this once, some time ago. The solution I used was to write a script in Perl. You could just write loads of s/kanji1/kanji2/, for example, (s/A/B/ means "substitute A for B" in Perl) or you could stuff all the kanji into an associative array and match them using a regular expression. I think (not sure) that the newer versions of perl have a \cJ operator which matches one Japanese character, so if the above-mentioned character overlap is the cause of the problem, then it would be solved that way. This is all from a little faded memory, but I hope this is somewhat helpful. Danny. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
- Follow-Ups:
- Re: [tlug] Translating old to new kanji forms using tr
- From: Josh Glover
- References:
- [tlug] Translating old to new kanji forms using tr
- From: David Riggs
Home | Main Index | Thread Index
- Prev by Date: [tlug] Translating old to new kanji forms using tr
- Next by Date: Re: [tlug] Translating old to new kanji forms using tr
- Previous by thread: [tlug] Translating old to new kanji forms using tr
- Next by thread: Re: [tlug] Translating old to new kanji forms using tr
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links