Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Are ordered hashes useful?




Hi,


On 16/02/11 01:47, Marty Pauley wrote:
On Tue, Feb 15, 2011 at 11:59 AM, Raymond Wan<rwan.kyoto@example.com>  wrote:

Yes, I don't agree that dictionary order and ordered list
are the same.  Ordered list access could be provided by a
tree, but to enable dictionary access, you either need to
augment the tree with a hash table or build two trees in
parallel.

Don't say that too loudly.  All the databases in the world may stop working!!!

Maybe by "ordered list" you mean the insertion order, which is where
this thread started.  I'm talking about sorted key order, or
"dictionary order" as you mentioned.


Yes, I guess this discussion has branched off into many things -- which is good to see. I've been talking about the order in which data is inserted and I presume that is also what you meant by "ordered list".

Quite frankly, I don't know the term "dictionary order". I've heard of it in the context of people talking about Perl, etc. but when I learned about algorithms (way way back :-) ), I don't recall anyone using the phrase "dictionary" [unless in some applications of algorithms...]. As far as I can remember :-), not in general algorithms.

But when I studied CS, the word "Web" was only in research papers and not in newspapers. Don't get me started on Facebook. :-D


But my main point is that trees do provide dictionary access.  That's
what they're for.

In pseudocode:

   t = new Tree
   t.insert("banana" =>  "yellow")
   t.insert("carrot" =>  "orange")
   t.insert("apple" =>  "red")
   t.insert("durian" =>  "smelly")
   t.keys == {"apple","banana","carrot","durian"} # ordered access
   t.get("carrot") == "orange" # fast lookup

Most large dictionaries use trees, not hashes.  The whole point of a
tree is that it provides fast lookup O(log n) and ordered access.
Hashes don't cope well with a large amount of data (unless it's known
in advance).


Yes, you are correct but let's clarify things a bit further.

Trees give you O(log n) access but to the keys (and thus their records) only. This is true for databases as well. Most people use some unique ID like a "social security number" or passport number as the unique ID for records. If you wanted to access anything else in the record (i.e., that is not this unique ID), then you will need to build a secondary index on top of the data or do a linear search through the data. I guess in SQL, it would be the "create index" command -- but it's been a while since I used SQL.

Even in databases, I guess this secondary index would be a tree; but if you knew of the properties of the data, I suppose a hash could work...

As for hashes, they "can" work for large amount of data provided you have a way to deal with collisions (i.e., two different things that hash to the same location). What matters, I guess, is that hashing your inputs will give a roughly uniform distribution of hash locations. Sometimes, we don't know this beforehand and yes, trees are better then.

Of course, you can combine the two and have a "hash table of trees". i.e., when two different things collide, you put them in the tree at that location in the hash table... Perhaps you could also have a tree of hash tables...I'm not so sure that would be useful, though...

Ray




Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links