
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tlug] How to make a current running kanji compound list from the news
Martin G writes:
> In the course of my studies, I came across mention of this book, which
> lists 1000 kanji compounds useful for reading the news:
> http://www.amazon.com/dp/0804809194/
(wget or other spider) + (FreeWAIS or Xapian or other full-text indexer)
is probably overkill, but they already produce the kind of statistics
needed, at least internally. Cross-referencing Wa-Ei dictionaries
should be easy to do. I'm pretty sure Jim provides several different
libraries for accessing EDICT files, as well as AJAX access or similar
to the WWWDict site. I think there are also web spiders written in
Python, which I mention because I know there are Python bindings for
libxapian (and because I don't like PHP ;-). WAIS is old enough
technology that I doubt there are bindings for many modern languages
(besides C), but I could be wrong. Probably similar facilities are
available for your choice of poison, though.
Home |
Main Index |
Thread Index