Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] array duplicate check [C] [ Solved ]

Josh Glover writes:

 > On 18/04/07, Stephen J. Turnbull <> wrote:

 > > Cost of a sort is M log M.

 > Assuming you have quicksort, and assuming non worst-case behaviour.

No, assuming you have anything like a reasonable sorting algorithm,
and non-worst-case behavior.  Sorting is just an M log M problem, and
you have to go out of your way to find an algorithm that takes M^2.
(Not very far, I'm sure everybody has used a bubble sort at some
time. :-)  You could use a different M log M sort every day of the
week, maybe even for a whole month.  And if Nguyen doesn't have
quicksort, then he can't even do #include <stdlib>, no?  (OK, maybe
ISO doesn't specify quicksort, but who'd use anything else?)  And
somebody stole his copy of Sedgewick.

Furthermore there are algorithms that guarantee M log M in the worst
case (with a bigger step coefficient than quicksort).  Python uses one
that is stable and in-place (ie, uses a constant amount of extra
space).  I don't actually know what the algorithm used is ;-), that's
advertisement from the guy who implemented it, but Tim Peters is one
of the great programmers of our day -- FWIW I trust him, even blowing
his own horn.  He claims that it's almost as fast as a stable
quicksort on average, and of course worst-case behavior is better.

BTW, this is the first time I've actually seen an application for a
Bloom filter.  I'm surprised nobody has called me on my N + M claim. :-)

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links