Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] array duplicate check [C] [ Solved ]



Darren Cook writes:

 > > BTW, this is the first time I've actually seen an application for a
 > > Bloom filter.  I'm surprised nobody has called me on my N + M claim. :-)

 > Bloom filters ( http://en.wikipedia.org/wiki/Bloom_filter ) are not
 > reliable (they can return false positives). If I've understood
 > correctly, in terms of the original question it means it might say an
 > element in b[] is in a[] when in fact it isn't.

You are correct.  The probability can be made arbitrarily low, though.

We'll have to ask Nguyen whether a rare false positive matters.  That
means the main task does not get run, even though the data is valid.
On the other hand, Bloom filters guarantee no false negatives, so the
main task never gets run on invalid data.

BTW, as I imagined Nguyen's application, he is expecting that there
will be zero positives, but needs to check.  In that case, positives
of either kind are "rare", so doing a full scan on M (< N) for
equality if you do observe a positive is "cheap".  But of course this
is probably O(M*N) in the worst case.  (It may be possible to choose
the Bloom filter to do better.)



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links