Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Japanese search engines



Does anyone have any suggestions for Japanese search engines? I would also
welcome pointers to sources of information.

The task at hand is to provide full-text search interface for a mirror Kanpo,
the official gazette of the Japanese government (yes, I did finally finish the
PDF site ripper I was working on!).  The amount of data is large -- the
archive expands at the rate of about 1.5 megabytes per day -- so scaleability
is VERY important.

I am most familiar with freeWAIS-sf, and the Japan-patched version is I think
what I need for this task, but it will require some work to set up an
interface to it.  So before I get too far into that:

  o Is there something better for very large full-text databases (I'm
    leery of Namazu, despite its popularity -- I did try to look up
    earlier discussions of it on TLUG, but the TLUG search engine
    is, ah, broken, as in "Alert!: HTTP/1.1 500 Internal Server Error").

  o freeWAIS-sf will blithely accept bound-to-fail queries and queries
    in broken syntax, simply returning no items found.  I've started
    working on a syntax checker.  I've settled on a model that will
    handle the problem, but ... has anyone already created such
    a thing?

  o Has anyone embedded nkf and kakasi or an equivalent into
    freeWAIS-sf-jp?  This site will see a lot of traffic when it goes
    live, and it seems wasteful to be spawning collateral processes for
    every instance of the WAIS client.

  o Is there a WAIS client module for Python?  For that matter, an
    nkf module?  A kakasi module?  A fastcgi module?

Sorry for the number of questions, but if you're going to ask one, might as
well ask 'em all.

Cheers,
Frank

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links