Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Places where to apply to for a technical internship?



On 20 June 2014 14:58, Drew Poulin <poulin@example.com> wrote:
> On Jun 18, 2014, at 4:09 PM, Simon Cozens wrote:
>> On 17/06/2014 18:38, Drew Poulin wrote:
>>> They developed Kuromoji, the Japanese morphological analyzer.
>>
>> How did I miss this thing? Looks very interesting...
>
> I thought so too.  I stumbled across it (and Atilika) in my old job while looking for a cheaper alternative to Basis Technology's morphological analyzer (Rosette).  I hope to start testing Kuromoji soon for use in my own Web site.

Looks interesting, but the biggest problem for me is they are
using the old IPADIC morpheme dictionary. Until they get past
saying that Unidic support is "experimental", I'll stick to MeCab.

> Here's a nice video of Atilika founder Christian Moen explaining the statistical approach that Kuromoji takes to morphological analysis:
>
> http://vimeo.com/42657763

Looks like they are using CRFs, as does MeCab. I guess Kuromoji
is the way to go if you want Java. MeCab is C/C++. I'd like to do
a side-by-side comparison some day, but they need to support
Unidic first (the people who built IPADIC at NAIST advise you to
use Unidic...)

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links