TLUG Mailing List

Mailing List Archive
Support open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Japanese search engines

To: tlug@example.com

Subject: Re: Japanese search engines

From: "Frank BENNETT (=?iso-2022-jp?B?GyRCJVUlaSVzJS8hISVZJU0lQyVIGyhC?= )" <bennett@example.com>

Date: Sun, 17 Dec 2000 21:48:35 +0900

Content-Transfer-Encoding: 7bit

Content-Type: text/plain; charset=iso-2022-jp

In-Reply-To: <4.2.0.58.J.20001217181239.02a4a028@example.com>; from YAMAGATA Hiroo on Sun, Dec 17, 2000 at 06:14:13PM +0900

References: <20001216192102.A2171@example.com> <20001216192102.A2171@example.com> <200012170902.SAA05208@example.com> <4.2.0.58.J.20001217181239.02a4a028@example.com>

Reply-To: tlug@example.com

Resent-From: tlug@example.com

Resent-Message-ID: <9uCVkB.A.Q6E.lYLP6@example.com>

Resent-Sender: tlug-request@example.com
On Sun, Dec 17, 2000 at 06:14:13PM +0900, YAMAGATA Hiroo wrote:
> At 18:07 00/12/17 +0900, you wrote:
> >I stick to Namazu + Kakashi. It's well designed Japanese search engine, if 
> >you installed properly. I did not have a experience to handle such huge 
> >data archive you mentioned , however it worth to test it.
> 
> Maybe better to use ChaSen rather than Kakashi. Those archaic Kanpo 
> languages may not score well with Kakashi... but you need to test.

Honda-san, Yamagata-san, thank you.  I will definitely look at ChaSen,
and once things settle down, I will take a stab at running Namazu
over the sources, so see how it performs.

I did sit down and write syntax-checking code in Python for freeWAIS-sf-jp
during the weekend.  The attractions of freeWAIS-sf are its support for
free-text parsing of the target document (so we can define a date field,
keyword fields, etc), and its support for proximity operators (for
example, "Prime w/2 Mori" would find documents containing "Prime Minister
Mori", as well as "George Mori likes prime rib", but not "Mori, it must
be said, is a poor excuse for a Prime Minister").

I absolutely need the first feature, because of the way my data set is
built.  The second is nice, because it looks and feels like the Lexis
service, with which most legal practitioners are familiar.  With Honda-san's
encouragement, I'll follow this one up myself.  Many thanks.

Oh, and I should mention that the archive I'm working on _will_ be thrown
open for general access in due course.  More news in a few more days :-)

Cheers,
Frank
References:

Japanese search engines
From: Frank BENNETT <bennett@example.com>

Re: Japanese search engines
From: Shigeo Honda <shige@example.com>

Re: Japanese search engines
From: YAMAGATA Hiroo <hiyori13@example.com>

Prev by Date: Re: "restarting" scsi

Next by Date: Re: "restarting" scsi

Prev by thread: Re: Japanese search engines

Next by thread: latex,lyx

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links