c++boostmultimapbucketboost-unordered

boost::unordered_multimap: get all elements in bucket efficiently?


I can get all elements in a single bucket with this code:

typedef boost::unordered_multimap< key, myClass*, MyHash<key> >
                                                HashMMap;
HashMMap::iterator it;
it = hashMMap_.find( someKey);
int bucketIndex = hashMMap_.bucket( someKey);
int bucketSize = hashMMap_.bucket_size( bucketIndex);

qDebug() << "index of bucket with key:" << someKey << " is:"
         << bucketIndex;
qDebug() << "number of elements in bucket with index:" << bucketIndex << " is:"
         << bucketSize;

HashMMap::local_iterator lit;
/* begin of bucket with index bucketIndex */
lit = hashMMap_.begin( bucketIndex);

for ( ; lit != sender_.hashMMap_.end( bucketIndex); ++lit) {
    qDebug() << "(*lit).first:" << (*lit).first << ", (*lit).second:" <<
               (*lit).second << ", (*lit).second->something_:" <<
               (*lit).second->something_;
}

I would like to get a local_iterator to the first element in a bucket and iterate over it till the bucket end, so if there is only one value for a given index in hash table (where index is the Hash(key)) I will iterate just through a single element and receive bucket end(), and in case of many elements I will iterate whole bucket (all values with equal hash). is this possible without bucketIndex, hashMMap_.begin( bucketIndex) and hashMMap_.end( bucketIndex) ?

so basically I would like to get a local_iterator like this:

HashMMap::local_iterator lit = hashMMap_.find_bucket_if_present( someKey);

Additional question is: do I have to test first if find() returns an iterator to element before calling int bucketIndex = hashMMap_.bucket( someKey) ? This is what I think because explanation of bucket() function from boost site is:

Returns: The index of the bucket which would contain an element with key k.

                                 ^^^

I think this means I have first to find(key) in the multimap to know if key is present, because a call to bucket(key) will return an index which is not a hash but modulo of hash (bucket_from_hash) in the hash table under which key is stored if it is present. So because of the modulo which is done with bucket_count, if key was not inserted I will iterate over a virtual bucket in which it would be under current circumstances, and what is most important for me: also different hashes could be there as the bucket_count might be less than my hash (I use 16-bit MyHash<key> of 32-bit key as a hash function provided into multimap constructor). Is this correct?


Solution

  • I would start working with ranges, like so:

    template<typename BoostUnorderedMap, typename Key>
    boost::iterator_range< typename BoostUnorderedMap::local_iterator > get_bucket_range( BoostUnorderedMap& myMap, Key const& k ) {
      int bucketIndex = myMap.bucket( k );
      return boost::iterator_range< typename BoostUnorderedMap::local_iterator >(
        myMap.begin(bucketIndex),
        myMap.end(bucketIndex)
      }
    }
    template<typename BoostUnorderedMap, typename Key>
    boost::iterator_range< typename BoostUnorderedMap::local_const_iterator > get_bucket_range( BoostUnorderedMap const& myMap, Key const& k ) {
      int bucketIndex = myMap.bucket( k );
      return boost::iterator_range< typename BoostUnorderedMap::local_const_iterator >(
        myMap.begin(bucketIndex),
        myMap.end(bucketIndex)
      }
    }
    

    then, at least in C++11, you can do the following:

    for (auto && entry : get_bucket_range( some_map, "bob" ) )
    

    and it iterates over everything in the "bob" bucket.

    While this does use bucketIndex, it hides these details from the end consumer, and simply gives you a boost::range instead.