Given any character, what's the fastest way for me to determine if that character belongs to a set (not the container-type) of known characters.
In other words, what's the fastest elegant way to implement the conditional:
char c = 'a';
if(c == ch1 || c == ch2 || c == ch3 ...) // Do something...
Is there an STL container (I'm thinking it might be unordered_set?) that I can just pass the character as a key to and it'll return true if the key exists?
Anything with an O(1) lookup time will work for me.
I went a little further and wrote two versions, one based on a lookup array, the other on a set using an underlying hash.
class CharLookup {
public:
CharLookup(const std::string & set) : lookup(*std::max_element(set.begin(), set.end()) + 1) {
for ( auto c : set) lookup[c] = true;
}
inline bool has(const unsigned char c) const {
return c > lookup.size() ? false : lookup[c];
}
private:
std::vector<bool> lookup;
};
class CharSet {
public:
CharSet(const std::string & cset) {
for ( auto c : cset) set.insert(c);
}
inline bool has(const unsigned char c) const {
return set.contains(c);
}
private:
QSet<unsigned char> set;
};
Then wrote a little benchmark, added a few more containers for the sake of comparison. Lower is better, the data points are for "character set size / text size":
Seems like for short char sets and text, std::string::find_first_of
is fastest, even faster than using a lookup array, but quickly dwindles as the test size increases. std::vector<bool>
seems like the "golden mean", QBitArray
probably has a little different implementation because it pulls ahead as the test size increases, at the largest test QVector<bool>
is fastest, presumably because it doesn't have the overhead of bit access. The two hash sets are close, trading places, last and least there is the std::set
.
Tested on an i7-3770k Win7 x64 box, using MinGW 4.9.1 x32 with -O3.