rustiteratorhashset

Creating an iterator over the `OccupiedEntry`s of a `HashSet` in Rust


Edit

I'm realizing that since Entrys own mutable borrows on their parent HashSets, it is impossible for multiple Entrys in a single HashSet to exist simultaneously. Therefore, Entrys will probably not be able to solve my problem.

For instance, if I had two OccupiedEntrys to the same value in the same set, I could use the first one to remove the value, and that would make the second one invalid.

But I am not going to do anything silly like that – so maybe there is some unsafe hack I could use to accomplish the kind of shortcut I'm looking for of avoiding rehashing the items?

Original post

Setup

I have set: HashSet<Foo>. I'd like to build several (disjoint) Vec<&Foo>s from some of the items of set, do some things with the Vecs, and then possibly remove all of the items appearing in the Vecs from set. One problem is obvious: each Vec has an immutable borrow on set, so as long as any of the Vecs exist, I cannot mutate set by removing the items referenced by any of the other Vecs.

I know how to achieve the output that I am looking for: some approaches are described below. My issue is that performance a high priority and I am trying to take all of the shortcuts I can.

Approaches I've considered

I could clone the relevant items from set into vec instead, but that has two drawbacks:

  1. The cloning is unnecessarily resource-intensive
  2. The clones do not know where their values originally lived in set, so to remove them from set, I need to rehash them, which I'd rather not do.

I could use an extract_if to pre-emptively remove the values and place them in the Vecs. This avoids the issue of cloning, but the inverse version of problem 2 above remains: sometimes I will want to put the values back in set, in which case I will need to rehash them.

I could replace the HashSet with a Vec. Naturally, there is a reason I'm using a HashSet instead of a Vec here, but maybe a Vec would be faster given the hashing bottleneck I'm facing.

When I first made this post, I had hoped to be able to use OccupiedEntrys into set as special references with remote-detonate buttons I could use to remove the referenced values from set when I was done with them, but I now recognize why that cannot work.

Current plan

What I'm realizing now is that I need to be keeping track of the hashes along with the items. Which pretty much adds up to: I need to be using a HashTable<(u64, Rc<Foo>)> instead of a HashSet<Foo>. I'll store the hashes of my Foos along with the Rc<Foo>s themselves in the HashTable and then when I iterate over the HashTable, I'll have the hashes already computed for me. Once I'm done using my Foo, I can do whatever I want with its entry in the HashTable using the hash that I will have held onto – no need to ever hash the same Foo more than once, which was the goal all along. If I get this working, I'll write an answer below and mark the question resolved.


Solution

  • This used to be possible with unsafe code in older versions of hashbrown (0.14) using the raw table API (https://docs.rs/hashbrown/0.14.5/hashbrown/raw/struct.RawTable.html), but unfortunately it was removed. Basically, it allows you to have an entry without reference to the map (Bucket). The plan is to iter() then remove() if needed. Still, if you agree to depend on an old version, it can work.