Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Date: Tue, 18 Jul 2006 12:28:03 -0400
- From: Jim <jep200404@example.com>
- Subject: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- References: <44BCAFF3.6030604@example.com>
Dave wrote: > What I'd like to do is take a Japanese document and convert it into a > list of the kanji included, and a list of words. Ideally repetitions > would be removed, as would particles and other grammatical inflections. > Hiragana and katakana words could be dropped too. Here are a few crumbs of ideas. tr ' \t\r' '\n\n\n' <document | grep <kanjiregex> | sort | uniq I don't know how to craft a regex to pass only kanji. Removing particles and other grammatical inflections might be a significant project in itself.
- Follow-Ups:
- Re: [tlug] [OT] Strip Kanji from a document for study purposes
- From: Botond Botyanszki
- References:
- [tlug] [OT] Strip Kanji from a document for study purposes
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by Date: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Previous by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links