Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: tlug: namazu



>>>>> "FB" == Frank Bennett <bennett@example.com> writes:

    FB> The size of indexes is pretty startling, but WAIS-sf
    FB> routinely produces indexes just a little larger than the
    FB> original data.  Something is obviously awry with Namazu, but
    FB> it looks like it's an orders of magnitude, not a
    FB> many-orders-of-magnitude thing.  :-)

Remember that repeating the same key in several files will result in
much more index data, as file names are probably longer than keys,
even if somehow compressed.  I suspect your application has different
distribution of keywords across files from Tony's.

    FB> Both are binaries, one wonders whether better performance
    FB> couldn't be obtained by incorporating them fully into the
    FB> modified WAIS binary itself.

Pipes are pretty efficient; there's not necessarily a need to run the
filters multiple times, just keep feeding more and more data into
them.  Although there would be some problem with synchronizing file
references, I guess.  But that could probably be solved by using a
sentinel to delimit files.

-- 
University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
_________________  _________________  _________________  _________________
What are those straight lines for?  "XEmacs rules."
--------------------------------------------------------------------
Next Nomikai Meeting: February 18 (Fri) 19:00 Tengu TokyoEkiMae
Next Technical Meeting:  March 11 (Sat) 13:00 Temple University Japan
* Topic: TBD
--------------------------------------------------------------------
more info: http://www.tlug.gr.jp        Sponsor: Global Online Japan


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links