Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] web search software



>>> I suggest you to use Hyper Estraier instead of Namazu. It has a web
>>> crawler so it would be useful such purpose.
>>> 
>>> http://hyperestraier.sourceforge.net/index.html

>> Lucene[1] and Ferret[2] are somewhat similar.
>> 1. http://lucene.apache.org/java/docs/
>> 2. http://ferret.davebalmain.com/trac/

> Seconded, but unfortunately they don't have any web crawler. I think
> Nutch is more appropriate.  http://lucene.apache.org/nutch/

I'm just catching up on this thread. HyperEstraier looks good (*) and it
has MeCab support (for breaking Japanese sentences into words) built-in.
Namazu also. Is anyone using Lucene, Ferret, Nutch or something else
with Japanese text, and can comment on what they use, if anything?

Wikia Search (http://search.wikia.com/wiki/Search_Wikia) uses Nutch and
Lucene apparently, but they only aim to do English search initially.

Darren

*: Having said that, I'm not sure it is using MeCab properly. E.g. a
search for 日本 finds hits for 日本語, which is like searching for
"path" and getting hits on "pathetic":
http://rbbs.sourceforge.jp/cgi-bin/estdemo-ja/estseek.cgi?phrase=%E6%97%A5%E6%9C%AC&perpage=10&attr=&order=&clip=-1

Ah, that may just be how HyperEstraier works, as "cat" finds hits for
"applications" and "category":
http://rbbs.sourceforge.jp/cgi-bin/estdemo-en/estseek.cgi?phrase=cat&perpage=10&attr=&order=&clip=7

Hhhmmm, looking less good now.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links