Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] web search software



At Fri, 28 Sep 2007 09:46:09 +0900,
Darren Cook wrote:
> I'm just catching up on this thread. HyperEstraier looks good (*) and it
> has MeCab support (for breaking Japanese sentences into words) built-in.
> Namazu also. Is anyone using Lucene, Ferret, Nutch or something else
> with Japanese text, and can comment on what they use, if anything?

According to my experience:

* Lucene

  It has a nice architecture because of separation of document
  analyzer. NutchDocumentAnalyzer (from Nutch) provides simple
  uni-gram search engine, and Japanese support.
  OTOH, JapaneseAnalyzer provides word-based-text-chunking search
  engine, based on Sen, a clone implementation of ChaSen by Java.

  http://ultimania.org/sen/
  http://tidus.ultimania.org/wiki/index.php?Lucene

* Ferret

  Ferret is an implementation of bi-gram search engine, and it
  supports UTF-8. I have never use it, but I heard someone used it for
  Japanese documents.

* Senna

  Senna is also major search engine with Japanese support. Senna
  itself is library form, but there are many bindings. Especially,
  Toritonn is very useful because it adds Japanese full-text search
  support for MySQL. Senna also supports MeCab and bi-gram.

  http://qwik.jp/senna/FrontPage.html (English)
  http://qwik.jp/tritonn/
-- 
NOKUBI Takatsugu
E-mail: knok@example.com
	knok@example.com / knok@example.com


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links