Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Japanese encoding



[Brett Robson (Re: [tlug] Japanese encoding) writes:]
>In a loose moment Jim wrote:
>> > the details at:
>> http://www.csse.monash.edu.au/~jwb/wwwjdicinf.html#examp_tag
>> "the collection is in need of considerable editing."
>> 
>> I am going to have a very large amount of free time from next week, I'm
>> happy to help.

It's a little early to turn it loose for hand editing. I looked into 
possibly setting up a CVS system, but really I want Windblows, Mac, etc. people 
in on it too if they can contribute. What I think I'll do eventually is 
have it up for rsync collection (rsync is available for Windows), and have
a very standard way of submitting updates so I can run them through a utility.

In the meantime, I'd like to do some more reduction of duplicates using 
software. I have tracked down and eliminated the straight replications
in the Japanese text. What I'd like to do it zoom in on things like:

	私達はよくいっしょにお昼を食べます。
	私達はよく一緒にお昼を食べます。

and knock out the first because 一緒に/いっしょに are the same. At
present I'm doing this by eye, noting where the English sentences are the same.

Examples like:

	私達はカヌーを借りた。
	私達はカヌ−を借りた。

are horrors to pin down.

Then there are the Japanese bloopers:

	私達は皆事態は深刻だと考えた。We all regarded the situation as serious.
	私達は皆自体は深刻だと考えた。We all regarded the situation as serious.

Way to go.

Jim

-- 
Jim Breen  (j.breen@example.com  http://www.csse.monash.edu.au/~jwb/)
Computer Science & Software Engineering,                Tel: +61 3 9905 3298
P.O Box 26, Monash University,                          Fax: +61 3 9905 5146
Clayton VIC 3800, Australia      ジム・ブリーン@モナシュ大学

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links