Mailing List Archive

Support open source code!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Japanese search engines

On Sun, Dec 17, 2000 at 06:14:13PM +0900, YAMAGATA Hiroo wrote:
> At 18:07 00/12/17 +0900, you wrote:
> >I stick to Namazu + Kakashi. It's well designed Japanese search engine, if 
> >you installed properly. I did not have a experience to handle such huge 
> >data archive you mentioned , however it worth to test it.
> Maybe better to use ChaSen rather than Kakashi. Those archaic Kanpo 
> languages may not score well with Kakashi... but you need to test.

Honda-san, Yamagata-san, thank you.  I will definitely look at ChaSen,
and once things settle down, I will take a stab at running Namazu
over the sources, so see how it performs.

I did sit down and write syntax-checking code in Python for freeWAIS-sf-jp
during the weekend.  The attractions of freeWAIS-sf are its support for
free-text parsing of the target document (so we can define a date field,
keyword fields, etc), and its support for proximity operators (for
example, "Prime w/2 Mori" would find documents containing "Prime Minister
Mori", as well as "George Mori likes prime rib", but not "Mori, it must
be said, is a poor excuse for a Prime Minister").

I absolutely need the first feature, because of the way my data set is
built.  The second is nice, because it looks and feels like the Lexis
service, with which most legal practitioners are familiar.  With Honda-san's
encouragement, I'll follow this one up myself.  Many thanks.

Oh, and I should mention that the archive I'm working on _will_ be thrown
open for general access in due course.  More news in a few more days :-)


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links