I faced an issue with Redis Penetration. Too many meaningless queries from the clients go to the database because no keys are matched in Redis. Now I hope to use a BloomFilter to filter some meaningless queries, but I don't know how to use it with Redis or DB.
When will the key be added in Bloom Filter ? just before the key is added in Redis cache ? Are Keys in Bloom Filter deleted if the key in Redis expires?
Or read all the keys from Database and put them in Blomm filter? but if the key is deleted from DB, can we delete the key from BloomFilter?
How to avoid cache penetration
Your best strategy of course would be to implement some logic (e.g., IP range filtering) before checking the cache.
On top of this, if the same unacceptable addresses are queried repeatedly, you may consider storing these addresses in Redis with an empty string value.
If you need to store millions of invalid keys, indeed, you may consider using a Bloom filter.
RedisBloom is a Redis module, developed by Redis Inc., that adds probabilistic data structures (including a Bloom filter) to Redis. You can install RedisBloom on top of an existing Redis, or switch to Redis Stack, which includes RedisBloom to start with.
RedisBloom commands are documented here.
Simply create a Bloom filter using BF.RESERVE and add invalid addresses using BF.ADD. To determine if an invalid address has been seen before, use BF.EXISTS (the answer "1" means that, with high probability, the value has been seen before, and a "0" means that it definitely wasn't seen before).
How to handle incoming requests
Since false positive matches are possible with a Bloom Filter (BF), you have several options:
Store all valid keys in a BF upfront
Store valid keys in a BF on-the-fly
Store invalid keys in a BF
Some notes: