Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][tlug] Re: Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)
- Date: Thu, 11 Sep 2014 14:25:41 +0900
- From: Drew Poulin <poulin@example.com>
- Subject: [tlug] Re: Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)
- References: <20140602081056.GA7953@camelia.2ion.de> <C6DD121F-0509-4139-A85A-80880BF46F53@transcomjapan.com> <53A13B29.8020305@simon-cozens.org> <C90892C4-3B8B-474A-BC45-76C319B16F1A@transcomjapan.com> <CABHGxq5pSZ4+Fi83k=z7-Exf6BEgJTFS+9qiRTrr6YAD6cRs=Q@mail.gmail.com>
On Jun 20, 2014, at 2:58 PM, Jim Breen <jimbreen@example.com> wrote: > Looks interesting, but the biggest problem for me is they are > using the old IPADIC morpheme dictionary. Until they get past > saying that Unidic support is "experimental", I'll stick to MeCab. > > Looks like they are using CRFs, as does MeCab. I guess Kuromoji > is the way to go if you want Java. MeCab is C/C++. I'd like to do > a side-by-side comparison some day, but they need to support > Unidic first (the people who built IPADIC at NAIST advise you to > use Unidic…) Pardon my dredging up this old thread, but I thought this might be worth passing along. Apparently the Unidic license prohibits redistribution, so it probably won’t be used with Kuromoji/Lucene/Solr: https://issues.apache.org/jira/browse/LUCENE-4056 The license also prohibits commercial use without the permission of the copyright holders (営利を目的として,UniDic ver.1.3.12 を利用する場合は,事前に著作権者と協議すること。) I’m curious about how others use open-source Japanese morphological analyzers with open-source databases. From what I have read, the possibilities include Kuromoji with Solr/Lucene and mecab with postgresQL (via textsearch_ja — http://textsearch-ja.projects.pgfoundry.org/textsearch_ja.html). Is there some widely preferred combination that I haven’t found yet? I know the big boys like Google and Yahoo use Basis Technology’s Rosette, but that’s a bit rich for my blood. Drew
- Follow-Ups:
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] GPL question (again!)
- Next by Date: Re: [tlug] Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)
- Previous by thread: Re: [tlug] Skype Pulse Audio Fixed [C&C warning]
- Next by thread: Re: [tlug] Japanese morphological analyzers (Was: Places where to apply to for a technical internship?)
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links