
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] oneliners, Was: Moving on from xterm
At Wed, 24 Aug 2016 12:11:58 +1000,
Jim Breen wrote:
> Apart from its age, IPADIC also had/has problems with release permissions
> dating back to its ICOT source. For that reason the people at NAIST built
> a replacement "NAIST DIC". (https://en.osdn.jp/projects/naist-jdic/)
Oh... This "IPADIC/ICOT license issue" is caused by me...
This problem was discussed on debian-legal mailing list, and the
following is the summary:
https://wiki.debian.org/IpadicLicense
Now debian treats ipadic as DFSG-free. BTW, this is only discussed on
Debian Project. Other distribution (like Fedora, OpenSuSE) don't care
about it.
I think this problem overrated in the public mind.
> > On the other hand, Toshinori Sato said that mecab-ipadic-neologd is
> > better performance than plain ipadic on text classification task.
> > It's really hard problem...
>
> "text classification task"って? For getting the right yomikata (aka furigana)
> on a proper name longer sequences can be useful, but there's a lot of
> text analysis where the stuff Sato has added would cause quite some grief.
> His addition of "中居正広のミになる図書館" as an entry is a hoot.
It means using mecab-ipadic-neologd for word segmentation, and not
using feature. Word segmentation is widery used for text
classification task. I didn't make clear.
Toshinari said using text classification task is for quantitive
evaluation for the dictionary. I heard from him in a public event, but
the are no presentation material, so I don't now the details.
In general natural language processing, mecab-ipadic-neologd is not
good. I agree with you.
By the way, I made a script to convert from SKKJISYO to kakasidict.
I think It is also useful for everyone.
http://www.namazu.org/gitweb/?p=dictconv.git;a=tree
The original kakasidict is also based on very old SKKJISYO, but
SKKJISYO itself has been updated now.
Home |
Main Index |
Thread Index