TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)

Date: Thu, 11 Sep 2014 16:39:30 +1000

From: Jim Breen <jimbreen@example.com>

Subject: Re: [tlug] Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)

References: <20140602081056.GA7953@camelia.2ion.de> <C6DD121F-0509-4139-A85A-80880BF46F53@transcomjapan.com> <53A13B29.8020305@simon-cozens.org> <C90892C4-3B8B-474A-BC45-76C319B16F1A@transcomjapan.com> <CABHGxq5pSZ4+Fi83k=z7-Exf6BEgJTFS+9qiRTrr6YAD6cRs=Q@mail.gmail.com> <5485A350-2840-48D3-BD4A-4768F5099A08@transcomjapan.com>
On 11 September 2014 15:25, Drew Poulin <poulin@example.com> wrote:
> On Jun 20, 2014, at 2:58 PM, Jim Breen <jimbreen@example.com> wrote:
>> Looks interesting, but the biggest problem for me is they are
>> using the old IPADIC morpheme dictionary. Until they get past
>> saying that Unidic support is "experimental", I'll stick to MeCab.
[...]
> Pardon my dredging up this old thread, but I thought this might be worth passing along.
>
> Apparently the Unidic license prohibits redistribution, so it probably won’t be used with Kuromoji/Lucene/Solr:
>
> https://issues.apache.org/jira/browse/LUCENE-4056

A couple of question come to mind:

- I wonder whether they asked the UniDic people is it was OK to to use
it in Lucene.

- according to the Kuromoji WWW page, their main morphological dictionary is
IPADIC. Fair enough, but IPADIC had/has redistribution problems too
dating from its
very early days. That's one of the reasons NAIST built the NAIST-JDIC
as they wanted
a less restricted lexicon.

> The license also prohibits commercial use without the permission of the copyright holders (営利を目的として，UniDic ver.1.3.12 を利用する場合は，事前に著作権者と協議すること。)

Again, I wonder if they asked.

> I’m curious about how others use open-source Japanese morphological analyzers with open-source databases.

I charge on with MeCab/UniDic, but then I'm neither redistributing nor
running a commercial
operation.

> From what I have read, the possibilities include Kuromoji with Solr/Lucene and mecab with postgresQL (via textsearch_ja ― http://textsearch-ja.projects.pgfoundry.org/textsearch_ja.html).
>
> Is there some widely preferred combination that I haven’t found yet?  I know the big boys like Google and Yahoo use Basis Technology’s Rosette, but that’s a bit rich for my blood.

Errm. Dunno about Yahoo, but Google dropped use of Basis's
morphological analyzer
in favour of an in-house developed system about 7-8 years ago.

Cheers

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
Follow-Ups:

Re: [tlug] Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)
From: Drew Poulin

References:

[tlug] Re: Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)
From: Drew Poulin

Prev by Date: [tlug] Re: Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)

Next by Date: Re: [tlug] Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)

Previous by thread: [tlug] Re: Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)

Next by thread: Re: [tlug] Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links