TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Translating old to new kanji forms using tr

Date: Wed, 29 Jun 2005 12:06:11 +0900

From: "Stephen J. Turnbull" <stephen@example.com>

Subject: Re: [tlug] Translating old to new kanji forms using tr

References: <42C1507D.5070608@example.com>

Organization: The XEmacs Project

User-agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.5 (cilantro, linux)
>>>>> "David" == David Riggs <dariggs@example.com> writes:

    David> I need to go back and forth between the old (旧字) and the
    David> modern kanji forms. I have a list of corresponding old and
    David> new form and the the stardard utility "tr" works fine for a
    David> test case:

tr(1) is byte-oriented, as far as I know, and any resemblence to
success is using up your good karma.  What is happening is that you
are feeding "E4 BB 8F" and "E4 BD 9B" to tr, and it is mapping E4->E4,
BB->BD, and 8F->9B for you byte-by-byte.

As far as I know byte-oriented is the case for all of "the usual
utilities", except that cut(1) claims to know about characters now.

    David> Which seems to make all the usual utilities work just fine
    David> with kanji inside the "konsole" (or plain old xterm as far
    David> as that goes).
That's because usage like "grep '[$B$"(B-$B$s(B]' file" is probably relatively
unusual for us gaijin.

Your best bet is to use a language like Python or **** that supports
Unicode internally.  They generally have functions that emulate the
standard command line utilities but work on Unicode strings as well as
on unibyte strings.  With **** you can probably write a one-liner.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.
References:

[tlug] Translating old to new kanji forms using tr
From: David Riggs

Prev by Date: RE: [tlug] Translating old to new kanji forms using tr

Next by Date: Re: [tlug] Translating old to new kanji forms using tr

Previous by thread: Re: [tlug] Translating old to new kanji forms using tr

Next by thread: Re: [tlug] Translating old to new kanji forms using tr

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links