Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] web search software
- Date: Fri, 28 Sep 2007 09:46:09 +0900
- From: Darren Cook <darren@example.com>
- Subject: Re: [tlug] web search software
- References: <46E4B1ED.7080904@ldp.jp> <46E4BB4D.4040703@dcook.org> <46E4C006.7040501@ldp.jp> <87fy1m13s8.wl%knok@daionet.gr.jp> <46E5F9A3.2060103@samsara.bebear.net> <87ejh522qb.wl%knok@daionet.gr.jp>
- User-agent: Thunderbird 1.5.0.12 (X11/20070530)
>>> I suggest you to use Hyper Estraier instead of Namazu. It has a web >>> crawler so it would be useful such purpose. >>> >>> http://hyperestraier.sourceforge.net/index.html >> Lucene[1] and Ferret[2] are somewhat similar. >> 1. http://lucene.apache.org/java/docs/ >> 2. http://ferret.davebalmain.com/trac/ > Seconded, but unfortunately they don't have any web crawler. I think > Nutch is more appropriate. http://lucene.apache.org/nutch/ I'm just catching up on this thread. HyperEstraier looks good (*) and it has MeCab support (for breaking Japanese sentences into words) built-in. Namazu also. Is anyone using Lucene, Ferret, Nutch or something else with Japanese text, and can comment on what they use, if anything? Wikia Search (http://search.wikia.com/wiki/Search_Wikia) uses Nutch and Lucene apparently, but they only aim to do English search initially. Darren *: Having said that, I'm not sure it is using MeCab properly. E.g. a search for 日本 finds hits for 日本語, which is like searching for "path" and getting hits on "pathetic": http://rbbs.sourceforge.jp/cgi-bin/estdemo-ja/estseek.cgi?phrase=%E6%97%A5%E6%9C%AC&perpage=10&attr=&order=&clip=-1 Ah, that may just be how HyperEstraier works, as "cat" finds hits for "applications" and "category": http://rbbs.sourceforge.jp/cgi-bin/estdemo-en/estseek.cgi?phrase=cat&perpage=10&attr=&order=&clip=7 Hhhmmm, looking less good now.
- References:
- [tlug] web search software
- From: Brett Robson
- Re: [tlug] web search software
- From: Darren Cook
- Re: [tlug] web search software
- From: Brett Robson
- Re: [tlug] web search software
- From: NOKUBI Takatsugu
- Re: [tlug] web search software
- From: emiddleton@example.com
- Re: [tlug] web search software
- From: NOKUBI Takatsugu
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Recovering from hosed GRUB-- still dead in the water
- Next by Date: Re: [tlug] dual boot - Windows/Linux
- Previous by thread: Re: [tlug] web search software
- Next by thread: Re: [tlug] web search software
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links