Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Are ordered hashes useful?



On 15 February 2011 17:47, Marty Pauley <marty.pauley@example.com> wrote:

> Hashes don't cope well with a large amount of data (unless it's known
> in advance).

Just for the benefit of the list:

A "hash" (associative array / dictionary) is usually implemented as an
array of b number of buckets, with each bucket corresponding to one
output of the hashing function. Each bucket typically contains a
linked list of length l, where l is the number of objects inserted
that hashed to the same output value. As you insert more and more
data, you get more collisions and thus longer lists in each bucket.
Though lookup time is still constant time--O(1)--the actual time taken
does increase, because traversing a linked list is O(n). So if you
have 1 million values in your hash table in bucket 3, it is going to
take you a million and one operations to fetch an object if you are so
unlucky as to need the last one in bucket 3.

There are more sophisticated implementations I'm sure, but that is how
it works in theory.

Some hash implementations let you specify an expected data size when
constructing a new hash, which is what Marty was alluding to there at
the end.

Cheers,
Josh


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links