Re: [tlug] Database frontend in Linux

Date: Mon, 01 Jun 2009 11:26:18 +0900
From: Edward Middleton <emiddleton@example.com>
Subject: Re: [tlug] Database frontend in Linux
References: <mailman.1.1243047601.6031.tlug@example.com> <BAY108-W32CB26AA19B9A7AE47072DA2570@example.com> <20090530102723.GA7204@example.com> <d8fcc0800905310503r27865f5ex3672fe533c7e724a@example.com>
User-agent: Thunderbird 2.0.0.21 (X11/20090323)

Josh Glover wrote:
> 2009/5/30 Christian Horn <chorn@example.com>
>> On Sat, May 23, 2009 at 06:05:13PM +0900, Raedwolf Summoner wrote:
>>     
>>> Pardon my greenhorn status, Christian, but I'm afraid I don't
>>> understand the difference [between a search engine and a
>>> database].
>>>       
>> A database would move or copy the data like soundfiles inside of it,
>> making the data harder to backup etc.
>>     
>
> Yes, this is it in a nutshell. Databases, especially relational ones,
> are great for storing data that is related somehow. Search engines are
> better at dealing with data that is not itself related, but with
> related *metadata*.
>   

I think structured vs unstructured data is the major difference. 
Databases are better at finding things like "the title of songs on album
x".  A search engine is better at finding "all things related to x". 
The other major difference is that databases are generally better for
closed world system (i.e. were there is a finite dataset and no result
means the thing doesn't exist).   Search engines are better for open
world situations like the web, were no result means I don't know.

An important distinction between a search engine and a database is that
a database returns facts[1] where as a search engine returns what appear
to be relationships based on data mining (i.e. statistics).  A database
result tells you what the database knows to be factual correct, a search
tells you what is statistically likely to be reliant.

[snip]
> But there is another problem that is harder to solve, and that is
> relevance. PageRank (Google's algorithm for determining which results
> bubble up to the top for any given search) is all about relevance. [1]
> It cares a lot about how popular a document is, which is determined by
> static analysis such as building massive graphs that show how well
> linked-to a document is, and feedback loops that ensure that documents
> that are clicked on a lot for a given search term move up the result
> list. This is why I don't *have* to do anything more than the
> following search to get stuff about the Tokyo Linux Users Group:
>   

The problem with page rank is that it doesn't solve the difficult
problem of finding relevance , it solves the easier problem of finding
popularity.  This makes it susceptible to SEO and Google bombing[2].  
It also means that unpopular but relevant topics aren't ranked highly.

Edward

1. the facts could be wrong but they are explicitly stated.
2. http://en.wikipedia.org/wiki/Google_bomb

Follow-Ups:
- Re: [tlug] Database frontend in Linux
  - From: Edward Middleton
- Re: [tlug] Database frontend in Linux
  - From: Josh Glover
- Re: [tlug] Database frontend in Linux
  - From: Stephen J. Turnbull

Prev by Date: Re: [tlug] Amarok 2 sucks
Next by Date: Re: [tlug] Suse 10 file access without root?
Previous by thread: Re: [tlug] Amarok 2 sucks
Next by thread: Re: [tlug] Database frontend in Linux
Index(es):
- Date
- Thread

Home | Main Index | Thread Index