Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] oneliners, Was: Moving on from xterm

At Wed, 24 Aug 2016 12:11:58 +1000,
Jim Breen wrote:
> Apart from its age, IPADIC also had/has problems with release permissions
> dating back to its ICOT source. For that reason the people at NAIST built
> a replacement "NAIST DIC". (

Oh... This "IPADIC/ICOT license issue" is caused by me...

This problem was discussed on debian-legal mailing list, and the
following is the summary:

Now debian treats ipadic as DFSG-free. BTW, this is only discussed on
Debian Project. Other distribution (like Fedora, OpenSuSE) don't care
about it.

I think this problem overrated in the public mind.

> > On the other hand, Toshinori Sato said that mecab-ipadic-neologd is
> > better performance than plain ipadic on text classification task.
> > It's really hard problem...
> "text classification task"って? For getting the right yomikata (aka furigana)
> on a proper name longer sequences can be useful, but there's a lot of
> text analysis where the stuff Sato has added would cause quite some grief.
> His addition of "中居正広のミになる図書館" as an entry is a hoot.

It means using mecab-ipadic-neologd for word segmentation, and not
using feature. Word segmentation is widery used for text
classification task. I didn't make clear.

Toshinari said using text classification task is for quantitive
evaluation for the dictionary. I heard from him in a public event, but
the are no presentation material, so I don't now the details.

In general natural language processing, mecab-ipadic-neologd is not
good. I agree with you.

By the way, I made a script to convert from SKKJISYO to kakasidict.
I think It is also useful for everyone.;a=tree

The original kakasidict is also based on very old SKKJISYO, but
SKKJISYO itself has been updated now.

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links