Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Database frontend in Linux
- Date: Mon, 01 Jun 2009 18:56:22 +0900
- From: Edward Middleton <emiddleton@example.com>
- Subject: Re: [tlug] Database frontend in Linux
- References: <mailman.1.1243047601.6031.tlug@example.com> <BAY108-W32CB26AA19B9A7AE47072DA2570@example.com> <20090530102723.GA7204@example.com> <d8fcc0800905310503r27865f5ex3672fe533c7e724a@example.com> <4A233C4A.70700@example.com> <87my8sbeu9.fsf@example.com>
- User-agent: Thunderbird 2.0.0.21 (X11/20090323)
Stephen J. Turnbull wrote: > Edward Middleton writes: > > Josh Glover wrote: > > > 2009/5/30 Christian Horn <chorn@example.com> > > >> On Sat, May 23, 2009 at 06:05:13PM +0900, Raedwolf Summoner wrote: > > > >>> Pardon my greenhorn status, Christian, but I'm afraid I don't > > >>> understand the difference [between a search engine and a > > >>> database]. > > >>> > > >> A database would move or copy the data like soundfiles inside of it, > > >> making the data harder to backup etc. > > > > > > Yes, this is it in a nutshell. Databases, especially relational ones, > > > are great for storing data that is related somehow. Search engines are > > > better at dealing with data that is not itself related, but with > > > related *metadata*. > > > > I think structured vs unstructured data is the major difference. > > A search engine is a *type* of database, or perhaps a better way to > put it, it is a front-end for adding unstructured data to a database. > The difference is that a database deals with structured finite data (closed world assumption[1]) directly . A search engine generates structured data in the form of statistics about the content of documents and queries the statistical data. As a result it can only make factual statements about the statistics (and only statistics for documents it knows about) it can't make factual claims about the source documents. In situations like the web were you have and open world[2] and unstructured data this is the best you can do, but being able to make factual claims is obviously more powerful. > > Databases are better at finding things like "the title of songs on album > > x". A search engine is better at finding "all things related to x". > > I tend to disagree. A search engine (as currently visible at Google > and friends) is an automatically updated KWIC database. They > generally are pretty good now at suggesting typos, but AFAICT none of > them track synonyms. Surely that's the obvious first step for "all > things related". > x being a hash key like the word "tlug". i.e. any possible usage of that four letter sting of characters, not the user intended meaning of x. > > [snip] > > > But there is another problem that is harder to solve, and that is > > > relevance. PageRank (Google's algorithm for determining which results > > > bubble up to the top for any given search) is all about relevance. [1] > > You want semantic web, I guess. But that can be spammed, too, in fact > it is likely *easier* to spam that, since evaluations are part of the > "semantic" part of links. > I would argue that the semantic web makes it harder for spammers because they are faced with the problem of making their content specific enough to trick the search engine into thinking they are relevant while conversely needing to be make more false representations in order to get access to a sufficient number of users. Or put another way, they need to tell more lies. Obviously the more lies they have to tell the greater the risk of them contradicting themselves and being caught out. > > The problem with page rank is that it doesn't solve the difficult > > problem of finding relevance, > > It does solve that problem assuming honest linking. This makes a lot of sense in academic publication (where it's call "citation indexing" rather than "page rank"), because it's expensive to Google bomb (the editors pick the documents containing the "links", so you have to construct a rather interesting document to be able to add your links to the the database). Well your assumption is that popularity is equivalent to relevance. i.e. an interesting[3] document is more relevant and thus conversely lack of popularity equates to irrelevance, because an unpopular article will rank lowly in page rank. Edward 1. http://en.wikipedia.org/wiki/Closed_world_assumption 2. http://en.wikipedia.org/wiki/Open_world_assumption 3. as determined by the number of times it is cited.
- Follow-Ups:
- Re: [tlug] Database frontend in Linux
- From: Stephen J. Turnbull
- References:
- Re: [tlug] Database frontend in Linux
- From: Edward Middleton
- Re: [tlug] Database frontend in Linux
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Amarok 2 sucks
- Next by Date: Re: [tlug] Database frontend in Linux
- Previous by thread: Re: [tlug] Database frontend in Linux
- Next by thread: Re: [tlug] Database frontend in Linux
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links