Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Date: Tue, 18 Jul 2006 11:24:42 +0200
- From: Botond Botyanszki <tlug@example.com>
- Subject: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- References: <44BCAFF3.6030604@example.com>
Hi! On Tue, 18 Jul 2006 18:54:59 +0900 Dave M G <martin@example.com> wrote: > What I'd like to do is take a Japanese document and convert it into a > list of the kanji included, and a list of words. Ideally repetitions > would be removed, as would particles and other grammatical > inflections. Hiragana and katakana words could be dropped too. > ... > Any thoughts or comments on how to achieve this would be appreciated. You will need to tokenize the japanese text, kakasi is said to to be able to do this, though I never used it. I doubt that there is an existing software for the task that you described, you'll probably need to do some scripting/programming yourself.
- References:
- [tlug] [OT] Strip Kanji from a document for study purposes
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by Date: [tlug] [OT] Strip Kanji from a document for study purposes
- Previous by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by thread: [tlug] [OT] Strip Kanji from a document for study purposes
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links