Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Database frontend in Linux



Edward Middleton writes:

 > The difference is that a database deals with structured finite data
 > (closed world assumption[1]) directly .

But so does the search engine.  The difference is that the search
engine acquires its structured data by parsing documents, while the
database gets its data from more structured systems of sensors or
input screens.  That doesn't mean that the database has complete,
correct, or consistent data in it, although you can obviously do a lot
of filtering if you can control the input format.

 > A search engine generates structured data in the form of statistics
 > about the content of documents and queries the statistical data.
 > As a result it can only make factual statements about the
 > statistics (and only statistics for documents it knows about) it
 > can't make factual claims about the source documents. In situations
 > like the web were you have and open world[2] and unstructured data
 > this is the best you can do, but being able to make factual claims
 > is obviously more powerful.

Not really.  CEOs, for example, generally don't give a damn where
customer Joe Tiny's shipment is; they want statistical information
about trends in demand, etc.  I really don't think there's a hard and
fast difference here, especially when you start talking about data
mining (which technically is quite different from statistics).

 > > I tend to disagree.  A search engine (as currently visible at Google
 > > and friends) is an automatically updated KWIC database.  They
 > > generally are pretty good now at suggesting typos, but AFAICT none of
 > > them track synonyms.  Surely that's the obvious first step for "all
 > > things related".
 > 
 > x being a hash key like the word "tlug". i.e. any possible usage of that
 > four letter sting of characters, not the user intended meaning of x.

Yes, I understand that.  The point is that it's a keyword, a text
string, not a semantic entity.  You won't pull up *BSD installfests if
you use TLUG as the key, but they are arguably related.

 > I would argue that the semantic web makes it harder for spammers because
 > they are faced with the problem of making their content specific enough
 > to trick the search engine into thinking they are relevant while
 > conversely needing to be make more false representations in order to get
 > access to a sufficient number of users.  Or put another way, they need
 > to tell more lies.  Obviously the more lies they have to tell the
 > greater the risk of them contradicting themselves and being caught out.

Only if you've got near-human intelligence reading the pages.  It's
trivial for human to recognize spam (with a few exceptions, such as
Unix-related job postings to TLUG, some of which are spam but most of
which are at worst misguided).  It is not so easy for machines.




Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links