Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] [OT] Calling for volunteers to mark possible dictionary entries


[This has already been posted on a couple of other lists.]

As some of you know, I am carrying out research into ways
of automatically identifying neologisms and other not-yet-in-
dictionary terms. As part of this I am experimenting with a
Machine Learning system which I am training to recognize the
sorts of words and terms that are included in dictionaries.

What I need is text in which such words have been identified by
people, so I can test the ML system and compare results.

I have selected a batch of 2,000 sentences from past issues of the
Mainichi Shimbun and Nikkei Shimbun. Half of these contain recent
JMdict additions (which the ML system doesn't know about) and the
other half are randomly selected and may or may not contain
unrecorded words.

What I need are volunteers to look at the sentences, see if they
contain unrecorded words which could be candidates to go in a
dictionary, mark any that they see, and indicate if there are none
or no more to mark. I need people who are reasonably comfortable
reading Japanese newspaper text.

I have put together a simple WWW system for displaying the
sentences (one at a time) and enabling terms to be marked,
comments added, etc. The system is at:

Please help out by looking at some sentences and marking
them. If people on this list did 20 or 30 sentences, the job would
be done quickly.

Looking forward to lots of activity.



Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Webmaster: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links