Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: namazu





On Mon, 7 Feb 2000, Stephen J. Turnbull wrote:

>    >>>>> "Selva" == Selva Nair <selva@example.com> writes:
>    
>        Selva> On Mon, 7 Feb 2000, Tony Laszlo wrote:
>    
>        >> Namazu finished indexing 2,644 files (111M).
>    
>        >> It took 43.4 hours and created 368M of index
>    
>        Selva> Was thinking of installing namazu, but this sounds scary.
>        Selva> But wait a minute, a 368M index from a 111M source? Can't
>        Selva> be 386K either as you have over 836K keywords. Some
>        Selva> misprint somewhere?
>    
>    No.  A dictionary with 50,000 words at 10 bytes/word should fill
>    maye 0.5MB?  Uh-uh, a dictionary with no definitions is useless.
>    

Of course. That's why I thought it can't be 386K, but, IMHO, 386M is a
bloated index. Almost 450 bytes per keyword, it is !

>        >> files with 836,439 keywords.
>    
>        Selva> Wow ! Didn't know there could be so many keywords, let
>        Selva> alone *key*words in this whole world :)
>    
>    Actually, they're key phrases, and they probably also include
>    information that allows partial matches or fuzzy matches, and multiple
>    criteria.  No?
>    

    Yeah, inclusion of data for partial matches could be why the number
of key words is that huge. But why to store phrases. Strorage of
a reference each to the preceding and following words should be enough
to construct phrase info. No? Info like file names shouldn't take much
space at all, as 2000 odd filenames can be indexed by a two byte word. I
would buy 100 bytes per keyword, but not 400. Or am I underestimating
the information content of an index? Or is it that tightly stored 
indices wont give good performance at the search time? 

Selva

--------------------------------------------------------------------
Next Nomikai Meeting: February 18 (Fri) 19:00 Tengu TokyoEkiMae
Next Technical Meeting:  March 11 (Sat) 13:00 Temple University Japan
* Topic: TBD
--------------------------------------------------------------------
more info: http://www.tlug.gr.jp        Sponsor: Global Online Japan


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links