data-structureshashtablelinear-probing

How does linear probing handle deletions without breaking lookups?


Here is my understanding of linear probing.

For insertion: - We hash to a certain position. If that position already has a value, we linearly increment to the next position, until we encounter an empty position, then we insert there. That makes sense.

My question revolves around lookup. From descriptions I have read, I believe lookup works like this:

So how does this work when we delete an item from a hash? Wouldn't that mess up this lookup? Say two items hash to the same position. We add both items, then delete the first one we added. So now, the expected position of the second item (which had to be moved to a different position, since the first item originally occupied it) is empty. Does deleting handle this in some way?


Solution

  • Great question! You are absolutely right that just removing an item from a linear probing table would cause problems in exactly the circumstance that you are reporting.

    There are a couple of solutions to this. One is to use tombstone deletion. In tombstone deletion, to remove an element, you replace the element with a marker called a tombstone that indicates "an element used to be here, but has since been removed." Then, when you do a lookup, you use the same procedure as before: jump to the hash location, then keep stepping forward until you find a blank spot. The idea here is that a tombstone doesn't count as a blank spot, so you'll keep scanning over it to find what you're searching for.

    To keep the number of tombstones low, there are nice techniques you can use like overwriting tombstones during insertions or globally rebuilding the table if the number of tombstones becomes too great.

    Another option is to use Robin Hood hashing and backward-shift deletion. In Robin Hood hashing, you store the elements in the table in a way that essentially keeps them sorted by their hash code (wraparound at the front of the table makes things a bit more complex than this, but that's the general idea). From there, when doing a deletion, you can shift elements backwards one spot to fill the gap from the removed element until you either hit a blank or an element that's already in the right place and doesn't need to be moved.

    For more information on this, check out these lecture slides on linear probing and Robin Hood hashing.

    Hope this helps!