Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Database frontend in Linux



Josh Glover wrote:
> 2009/5/30 Christian Horn <chorn@example.com>
>> On Sat, May 23, 2009 at 06:05:13PM +0900, Raedwolf Summoner wrote:
>>     
>>> Pardon my greenhorn status, Christian, but I'm afraid I don't
>>> understand the difference [between a search engine and a
>>> database].
>>>       
>> A database would move or copy the data like soundfiles inside of it,
>> making the data harder to backup etc.
>>     
>
> Yes, this is it in a nutshell. Databases, especially relational ones,
> are great for storing data that is related somehow. Search engines are
> better at dealing with data that is not itself related, but with
> related *metadata*.
>   

I think structured vs unstructured data is the major difference. 
Databases are better at finding things like "the title of songs on album
x".  A search engine is better at finding "all things related to x". 
The other major difference is that databases are generally better for
closed world system (i.e. were there is a finite dataset and no result
means the thing doesn't exist).   Search engines are better for open
world situations like the web, were no result means I don't know.

An important distinction between a search engine and a database is that
a database returns facts[1] where as a search engine returns what appear
to be relationships based on data mining (i.e. statistics).  A database
result tells you what the database knows to be factual correct, a search
tells you what is statistically likely to be reliant.

[snip]
> But there is another problem that is harder to solve, and that is
> relevance. PageRank (Google's algorithm for determining which results
> bubble up to the top for any given search) is all about relevance. [1]
> It cares a lot about how popular a document is, which is determined by
> static analysis such as building massive graphs that show how well
> linked-to a document is, and feedback loops that ensure that documents
> that are clicked on a lot for a given search term move up the result
> list. This is why I don't *have* to do anything more than the
> following search to get stuff about the Tokyo Linux Users Group:
>   

The problem with page rank is that it doesn't solve the difficult
problem of finding relevance , it solves the easier problem of finding
popularity.  This makes it susceptible to SEO and Google bombing[2].  
It also means that unpopular but relevant topics aren't ranked highly.

Edward

1. the facts could be wrong but they are explicitly stated.
2. http://en.wikipedia.org/wiki/Google_bomb


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links