Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: namazu
- To: tlug@example.com
- Subject: Re: tlug: namazu
- From: Selva Nair <selva@example.com>
- Date: Mon, 7 Feb 2000 14:48:44 +0900 (JST)
- Content-Type: TEXT/PLAIN; charset=US-ASCII
- In-Reply-To: <14494.19709.87713.38966@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
On Mon, 7 Feb 2000, Stephen J. Turnbull wrote: > >>>>> "Selva" == Selva Nair <selva@example.com> writes: > > Selva> On Mon, 7 Feb 2000, Tony Laszlo wrote: > > >> Namazu finished indexing 2,644 files (111M). > > >> It took 43.4 hours and created 368M of index > > Selva> Was thinking of installing namazu, but this sounds scary. > Selva> But wait a minute, a 368M index from a 111M source? Can't > Selva> be 386K either as you have over 836K keywords. Some > Selva> misprint somewhere? > > No. A dictionary with 50,000 words at 10 bytes/word should fill > maye 0.5MB? Uh-uh, a dictionary with no definitions is useless. > Of course. That's why I thought it can't be 386K, but, IMHO, 386M is a bloated index. Almost 450 bytes per keyword, it is ! > >> files with 836,439 keywords. > > Selva> Wow ! Didn't know there could be so many keywords, let > Selva> alone *key*words in this whole world :) > > Actually, they're key phrases, and they probably also include > information that allows partial matches or fuzzy matches, and multiple > criteria. No? > Yeah, inclusion of data for partial matches could be why the number of key words is that huge. But why to store phrases. Strorage of a reference each to the preceding and following words should be enough to construct phrase info. No? Info like file names shouldn't take much space at all, as 2000 odd filenames can be indexed by a two byte word. I would buy 100 bytes per keyword, but not 400. Or am I underestimating the information content of an index? Or is it that tightly stored indices wont give good performance at the search time? Selva -------------------------------------------------------------------- Next Nomikai Meeting: February 18 (Fri) 19:00 Tengu TokyoEkiMae Next Technical Meeting: March 11 (Sat) 13:00 Temple University Japan * Topic: TBD -------------------------------------------------------------------- more info: http://www.tlug.gr.jp Sponsor: Global Online Japan
- Follow-Ups:
- Re: tlug: namazu
- From: "Stephen J. Turnbull" <turnbull@example.com>
- References:
- Re: tlug: namazu
- From: "Stephen J. Turnbull" <turnbull@example.com>
Home | Main Index | Thread Index
- Prev by Date: RE: tlug: namazu
- Next by Date: RE: tlug: namazu
- Prev by thread: Re: tlug: namazu
- Next by thread: Re: tlug: namazu
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links