Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Date: Wed, 19 Jul 2006 00:06:01 +0900
- From: Nikolay Elenkov <goibniu@example.com>
- Subject: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- References: <44BCAFF3.6030604@example.com>
- User-agent: Thunderbird 1.5.0.2 (X11/20060501)
Dave M G wrote:There may be existing software that does what I'm looking for, but I haven't seen it. If you know of a suitable Linux based application, please let me know.What I'd like to do is take a Japanese document and convert it into a list of the kanji included, and a list of words. Ideally repetitions would be removed, as would particles and other grammatical inflections. Hiragana and katakana words could be dropped too.Try Juman: http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html Here's a CGI to try it out: http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman-form.htmlIt doesn't do everything you want out of the box, but it's pretty powerful and with a bit of scripting and piping you should be able to get want you want. (it has a Perl module, I think)
- References:
- [tlug] [OT] Strip Kanji from a document for study purposes
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by Date: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Previous by thread: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links