Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] How to make a current running kanji compound list from the news



Martin G writes:

 > In the course of my studies, I came across mention of this book, which
 > lists 1000 kanji compounds useful for reading the news:
 > http://www.amazon.com/dp/0804809194/

(wget or other spider) + (FreeWAIS or Xapian or other full-text indexer)

is probably overkill, but they already produce the kind of statistics
needed, at least internally.  Cross-referencing Wa-Ei dictionaries
should be easy to do.  I'm pretty sure Jim provides several different
libraries for accessing EDICT files, as well as AJAX access or similar
to the WWWDict site.  I think there are also web spiders written in
Python, which I mention because I know there are Python bindings for
libxapian (and because I don't like PHP ;-).  WAIS is old enough
technology that I doubt there are bindings for many modern languages
(besides C), but I could be wrong.  Probably similar facilities are
available for your choice of poison, though.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links