Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Database frontend in Linux
- Date: Mon, 01 Jun 2009 16:19:58 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Database frontend in Linux
- References: <mailman.1.1243047601.6031.tlug@example.com> <BAY108-W32CB26AA19B9A7AE47072DA2570@example.com> <20090530102723.GA7204@example.com> <d8fcc0800905310503r27865f5ex3672fe533c7e724a@example.com> <4A233C4A.70700@example.com>
Edward Middleton writes: > Josh Glover wrote: > > 2009/5/30 Christian Horn <chorn@example.com> > >> On Sat, May 23, 2009 at 06:05:13PM +0900, Raedwolf Summoner wrote: > >>> Pardon my greenhorn status, Christian, but I'm afraid I don't > >>> understand the difference [between a search engine and a > >>> database]. > >>> > >> A database would move or copy the data like soundfiles inside of it, > >> making the data harder to backup etc. > > > > Yes, this is it in a nutshell. Databases, especially relational ones, > > are great for storing data that is related somehow. Search engines are > > better at dealing with data that is not itself related, but with > > related *metadata*. > > I think structured vs unstructured data is the major difference. A search engine is a *type* of database, or perhaps a better way to put it, it is a front-end for adding unstructured data to a database. The big difference between Ask Sam and a web search engine is that the actual documents are stored internally by Ask Sam, whereas a conventional search engine just stores URLs. Of course the commercial search engines like Google and Amazon long ago started caching the result documents, although more recently both have dropped all pretense of being interested in caching (which is probably fair use under U.S. copyright law). Instead they are creating KWIC-indexed[1] document repositories (which is a pretty good name for Ask Sam, as I understand Ask Sam). > Databases are better at finding things like "the title of songs on album > x". A search engine is better at finding "all things related to x". I tend to disagree. A search engine (as currently visible at Google and friends) is an automatically updated KWIC database. They generally are pretty good now at suggesting typos, but AFAICT none of them track synonyms. Surely that's the obvious first step for "all things related". > [snip] > > But there is another problem that is harder to solve, and that is > > relevance. PageRank (Google's algorithm for determining which results > > bubble up to the top for any given search) is all about relevance. [1] You want semantic web, I guess. But that can be spammed, too, in fact it is likely *easier* to spam that, since evaluations are part of the "semantic" part of links. > The problem with page rank is that it doesn't solve the difficult > problem of finding relevance , It does solve that problem assuming honest linking. This makes a lot of sense in academic publication (where it's call "citation indexing" rather than "page rank"), because it's expensive to Google bomb (the editors pick the documents containing the "links", so you have to construct a rather interesting document to be able to add your links to the the database). The problem you're referring to is that "fuzaketeru" documents get indexed, too, so link data can be spoofed. But that's not the fault of the page rank algorithm, per se, that's a problem for input filtering. Footnotes: [1] KWIC = key word in context (like WAIS). Here I'm using it in a more general sense to include fuzzy algorithms such as those used by Xapian, as well.
- Follow-Ups:
- Re: [tlug] Database frontend in Linux
- From: Josh Glover
- Re: [tlug] Database frontend in Linux
- From: Edward Middleton
- References:
- Re: [tlug] Database frontend in Linux
- From: Edward Middleton
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Amarok 2 sucks
- Next by Date: Re: [tlug] Amarok 2 sucks
- Previous by thread: Re: [tlug] Database frontend in Linux
- Next by thread: Re: [tlug] Database frontend in Linux
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links