redisbloom-filter

How to use BloomFilter with Redis


I faced an issue with Redis Penetration. Too many meaningless queries from the clients go to the database because no keys are matched in Redis. Now I hope to use a BloomFilter to filter some meaningless queries, but I don't know how to use it with Redis or DB.

When will the key be added in Bloom Filter ? just before the key is added in Redis cache ? Are Keys in Bloom Filter deleted if the key in Redis expires?

Or read all the keys from Database and put them in Blomm filter? but if the key is deleted from DB, can we delete the key from BloomFilter?


Solution

  • How to avoid cache penetration

    Your best strategy of course would be to implement some logic (e.g., IP range filtering) before checking the cache.

    On top of this, if the same unacceptable addresses are queried repeatedly, you may consider storing these addresses in Redis with an empty string value.

    If you need to store millions of invalid keys, indeed, you may consider using a Bloom filter.

    RedisBloom is a Redis module, developed by Redis Inc., that adds probabilistic data structures (including a Bloom filter) to Redis. You can install RedisBloom on top of an existing Redis, or switch to Redis Stack, which includes RedisBloom to start with.

    RedisBloom commands are documented here.

    Simply create a Bloom filter using BF.RESERVE and add invalid addresses using BF.ADD. To determine if an invalid address has been seen before, use BF.EXISTS (the answer "1" means that, with high probability, the value has been seen before, and a "0" means that it definitely wasn't seen before).

    How to handle incoming requests

    Since false positive matches are possible with a Bloom Filter (BF), you have several options:

    1. Store all valid keys in a BF upfront

      • Add all valid keys to the BF
      • When a request is received, search in the Bloom filter.
      • If found in the BF - it is, with high probability, a valid key. Try to fetch it from the DB. If not found in the DB (low probability) it was a false positive.
      • If not found in the BF: it is necessarily an invalid key.
    2. Store valid keys in a BF on-the-fly

      • When a request is received, search in the Bloom filter.
      • If found in the BF - it is, with high probability, a valid key that was already seen. Try to fetch it from the DB. If not found in the DB (low probability) it was a false positive.
      • If not found in the BF - it is either a first-time valid key or an invalid key. Check, and if valid - add to the BF.
    3. Store invalid keys in a BF

      • When a request is received, search in the Bloom filter.
      • If found in the BF - it is, with high probability, an invalid key. Note that it may be a valid key (low probability) and you will ignore it, but that's a price you should be ready to pay if you go this way.
      • If not found in the BF - it is either a valid key or a first-time invalid key. Check, and if invalid - add to the BF.

    Some notes: