Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] web search software
- Date: Tue, 02 Oct 2007 10:27:09 +0900
- From: NOKUBI Takatsugu <knok@example.com>
- Subject: Re: [tlug] web search software
- References: <46E4B1ED.7080904@ldp.jp> <46E4BB4D.4040703@dcook.org> <46E4C006.7040501@ldp.jp> <87fy1m13s8.wl%knok@daionet.gr.jp> <46E5F9A3.2060103@samsara.bebear.net> <87ejh522qb.wl%knok@daionet.gr.jp> <46FC4ED1.6020603@dcook.org>
- User-agent: Wanderlust/2.15.4 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/21.4 (i486-pc-linux-gnu) MULE/5.0 (SAKAKI)
At Fri, 28 Sep 2007 09:46:09 +0900, Darren Cook wrote: > I'm just catching up on this thread. HyperEstraier looks good (*) and it > has MeCab support (for breaking Japanese sentences into words) built-in. > Namazu also. Is anyone using Lucene, Ferret, Nutch or something else > with Japanese text, and can comment on what they use, if anything? According to my experience: * Lucene It has a nice architecture because of separation of document analyzer. NutchDocumentAnalyzer (from Nutch) provides simple uni-gram search engine, and Japanese support. OTOH, JapaneseAnalyzer provides word-based-text-chunking search engine, based on Sen, a clone implementation of ChaSen by Java. http://ultimania.org/sen/ http://tidus.ultimania.org/wiki/index.php?Lucene * Ferret Ferret is an implementation of bi-gram search engine, and it supports UTF-8. I have never use it, but I heard someone used it for Japanese documents. * Senna Senna is also major search engine with Japanese support. Senna itself is library form, but there are many bindings. Especially, Toritonn is very useful because it adds Japanese full-text search support for MySQL. Senna also supports MeCab and bi-gram. http://qwik.jp/senna/FrontPage.html (English) http://qwik.jp/tritonn/ -- NOKUBI Takatsugu E-mail: knok@example.com knok@example.com / knok@example.com
- Follow-Ups:
- Re: [tlug] web search software
- From: Darren Cook
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Getting Audacity to not stutter
- Next by Date: RE: [tlug] Getting Audacity to not stutter (was: How to change IRQaddress for a sound card?)
- Previous by thread: Re: [tlug] Announcement TLUG Nomikai October 12th
- Next by thread: Re: [tlug] web search software
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links