I am currently reading the CLRS book and looking at how hashtables are defined there, when talking about open hashing and using chaining for collision resolution there is a piece of text that reads:
If the hash table supports deletion, then its linked lists should be doubly linked
so that we can delete an item quickly. If the lists were only singly linked, then to
delete element x, we would first have to find x in the list T[h(x.key)] so that we
could update the next attribute of x’s predecessor. With singly linked lists, both
deletion and searching would have the same asymptotic running times.)
What I dont understand is why this is the case (or more specifically how), this is not a question about why deleting from a linked list when you know the element is faster if its a doubly linked list, I thought I should make that clear...
This is related to how it is possible to actually know the location, and so a doubly linked list can make this difference because looking at the hash delete pseudocode (below) the key is used to generate the hash which leads to the correct array index of where the linked list can be found but how exactly can that be translated into the actual position in the linked list of the item?
CHAINED-HASH-DELETE(T, x)
1 delete x from the list T[h(x.key)]
It looks to me like hashing the key leads to the linked list only, so in either case the list would still need to be searched to find the actual value currently being deleted?
Im sure there is a simple answer but it is just not clear to me!
It looks to me like hashing the key leads to the linked list only, so in either case the list would still need to be searched to find the actual value currently being deleted?
That's right, when someone's asking to erase(key)
there's no benefit from a doubly-linked list as it is easy to erase while traversing even a singly-linked list.
Still, it's quite common to have code like (pseudo-code)...
if (iterator i = hash_table.find(key)) {
do_something_with(*i);
if (should_erase(*i))
erase(i);
}
Here, the iterator i
can be a pointer to a node in the doubly-linked list, so each dereferencing operation *i
to access the element/value associated with that node doesn't have to repeat any part of the hash table bucket lookup or doubly-linked list search.
If it then decides to erase(i)
, the iterator identifies the node to erase without needing another search.
If you only used a singly-linked list, then for erase(iterator)
to avoid re-searching, it would need to store a pointer to the previous element's next pointer (which is actually what GCC's C++ hash table containers do). If it only stores that (to keep iterators smaller), then to access the element the iterator logically addresses it has to first lookup the previous element's next pointer, then follow that to the logically addressed node: that's not super efficient. If it stores the address of the logically-addressed element and the previous element (or more specifically its next pointer), then iterators become larger objects, also with potential memory usage and performance impact. Still, you're saving a "previous" pointer per link in the list, and there's more likely to be lots of elements than lots of iterators.