phpredispredis

predis SCAN slower than KEYS


I need a method to get all keys by prefix to delete them.

I've read about the KEYS not being suitable for production, so i made a few tests to check performance. Im using predis 1.1.6 (php) and i tested both in my local machine and in a testing AWS environment with elasticache redis. Im doing this on a node with about 300k items.

Im using prefixes: CLIENT/ID_CLIENT/MODULE:HASH which translates in client/9999/products:452a269b82c199ef27f5a299e3b0f98531216ccf

So i need to search and delete all keys from a client and module. Since i use prefixes, i've set the correct prefix and used the predis keys method:

$this->_redisPrefix('client/9999/products:');
$keys = $this->_redis_client->keys('*');

This is extremely fast, it takes about 50ms.

Since KEYS is not recommended on production, i tried to achieve the same thing with SCAN. predis does not have a scan method so i needed to this:

foreach (new Iterator\Keyspace($this->_redis_client, 'client/9999/products:*') as $key) {
    $keys[] = $key;
}

This returns the exact same results but it took 20 seconds(!). I thought this was something related with my local machine, but i’ve deployed it to our aws enviorment and the response times were the same. I did not used pagination because i need all the items to be deleted and i dont know how many. It can be 10 or it can be 1000 (or more)

I want to avoid KEYS, but i cannot use SCAN with this kind of timings.


Solution

  • Using KEYS in production

    First its important to understand why KEYS shouldn't be used in production.

    KEYS has a time complexity of O(N), where N is the number of elements of the entire database. NOT how many satisfies the pattern. Since only one command can run at the same time (Redis not being multi-threaded), everything else will have to wait for that KEYS to complete.

    see: Why KEYS is advised not to be used in Redis?

    According to the docs:

    While the time complexity for this operation is O(N), the constant times are fairly low. For example, Redis running on an entry level laptop can scan a 1 million key database in 40 milliseconds.

    Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don't use KEYS in your regular application code. If you're looking for a way to find keys in a subset of your keyspace, consider using SCAN or sets.

    This would indicate that if you have less than a million records, using keys should be okay ish. But as your database grows, or you have more concurrent users, issues may arise.

    Alternatives to KEYS

    SCAN

    A common alternative to KEYS is SCAN (which is what you are using). Note that this is still a bad alternative, as its very similar to KEYS, except that the result is returned in chunks, and has O(N), where N is the number of elements of the entire database.

    The advantage is that it doesn't block the server, but it has the same time complexity has KEYS. In fact, if all you want to get is the result, and don't care about blocking the database, it can be slower than KEYS as it has to perform multiple queries (as you have seen).

    HSET

    A much better alternative is to use a HSET.

    When you want to put elements into a HSET, use:

    HSET client/9999/products "id_547" "Book"
    HSET client/9999/products "whatever_key_you_want" "Laptop"
    
    $this->_redis_client->hset('client/9999/products', 'id_547', 'Book');
    $this->_redis_client->hset('client/9999/products', 'whatever_key_you_want', 'Laptop');
    

    And when you want to get all the keys just use HKEYS:

    HKEYS client/9999/products
    1) id_547
    2) whatever_key_you_want
    
    $this->_redis_client->hkeys('client/9999/products')
    

    Unlike KEYS, the complexity of HKEYS is O(N) where N is the size of the hash (NOT the size of the entire database).

    If the keys get very large you may want to use HSCAN.

    Performance test

    In a redis database with around 2,000,000 items:

    for ($i = 0; $i <= 100; $i++) {
        $client->set("a:{$i}", "value{$i}");
    }
    for ($i = 0; $i <= 100; $i++) {
        $client->hset("b", $i, "value{$i}");
    }
    
    

    Test 1: KEYS

    $start = microtime(true);
    var_dump(count($client->keys('a:*')));
    $end = microtime(TRUE);
    echo ($end - $start) . "s\n";
    

    Test 2: SCAN

    $start = microtime(true);
    $count = 0;
    foreach (new Keyspace($client, 'a:*') as $key) {
        $count++;
    }
    $end = microtime(TRUE);
    echo ($end - $start) . "s\n";
    

    Test 3: HKEYS

    $start = microtime(true);
    var_dump(count($client->hkeys('b')));
    $end = microtime(TRUE);
    echo ($end - $start) . "s\n";
    

    Results

    As you can see, HKEYS is much faster, and is unaffected by the size of the database.

    I also recommend using redis PECL extension instead of predis:

    With Redis extension I got: