Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] How to make a current running kanji compound list from the news


In the course of my studies, I came across mention of this book, which
lists 1000 kanji compounds useful for reading the news:

I'm thinking about getting it, even though it's out of print and
hasn't been updated in thirty years.

However, I got to thinking about it, and wondered if with all the
modern tools and the fact that almost all news is online, surely
(hopefully) there would be a way to scan news sites for the most
common compounds and make a spreadsheet of them.

I know one can create graphs of trends in search terms on Google, and
make lists(?) of most popular search terms. Which is why I have this
vague notion that something similar could be constructed out of
existing tools, if it doesn't exist already (I searched but came up
with nothing, though I may not be describing it right).

Anyway, I think TLUG are the go-to guys for this, sitting on the nexus
of internet, coding, and Japanese knowledge.

Would there be a way to:
1. Select a site, set of sites, or possibly an aggregate site to use
as source material.
2. Set a start and end time to frame a span of time in which to select
news articles.
3. Create a list of the most used compounds within that search criteria
4. This step might be a doozy - cross reference that list with WWWJDIC
to get readings and definitions.
5. Output a CSV or text file or something with the compounds,
readings, and definitions in three columns.

I had a PHP code thing that would search within one body of text, pull
out words, and create a study list... it's been years since I've
touched it, so I have to look for it, but if it might help I'll see if
I can dig it up.

What do you guys think? Could be a powerful learning aid.

Any advice would be much appreciated.

Dave M G

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links