I have a Redis-Hash with millions of elements, constantly adding new ones. In php, I run an endless loop to get, process, and delete one element afteh the other. Hereby, I need to get the key of any existing element (preferebly the first one inserted in the hash, FiFo)
while($redis->hlen()) {
$key = ???
// process $key
}
While I know the RANDOMKEY
and the SRANDMEMBER
command, I did not find any way to get a key of a hash. HGETALL
and HKEYS
are, due to the size of the hash, not an option either. I need sequential processing.
Help appreciated.
There is no trick to access a random item (or the first, or the last) of a given hash object.
Should you need to iterate on hash objects, you have several possibilities:
a first one is to complement the hash with another data structure you can slice (like a list or a zset). If you only add items in the hash (and iterate to delete them) a list is enough. If you can add/remove/update items (and iterate to delete them) then a zset is required (put a timestamp as a score). Both list of zset can be sliced (lrange, zrange, zrangebyscore), so it is easy to iterate on them chunk by chunk, and maintain both data structures in sync.
a second one is to complement the hash with another data structure supporting pop like operations, such as a list or a set (lpop, rpop, spop). Instead of iterating on the hash object, you can pop out all the objects from the secondary structure and maintain the hash object accordingly. Again, both data structures need to be kept in sync.
a third one is to split the hash object in many pieces. This is actually memory efficient, because your keys are stored only once, and Redis can leverage ziplist memory optimization.
So instead of storing your hash as:
myobject -> { key1:xxxx, key2:yyyyy, key3:zzzz }
you can store:
myobject:<hashcode1> -> { key1:xxxx, key3:zzzz }
myobject:<hashcode2> -> { key2:yyyy }
...
To calculate the extra hashcode, you can apply any hash function on your keys which offers a good distribution. On the above example, we assume key1 and key3 have the same hashcode1 value and key2 has the hashcode2 value.
You can find more information about this kind of data structures here:
Redis 10x more memory usage than data
The output cardinality of the hash function should be calculated so that the number of items per hash object is limited to a given value. For instance, if we choose to have 100 items per hash objects, and we need to store 1M items, we will need a cardinality of 10K. To limit the cardinality, simply using a modulo operation on a generic hash function is enough.
The benefit is it will be compact in memory (ziplist used), and you can easily iterate destructively on the hash objects by pipelining hgetall+del on all of them:
hgetall myobject:0
... at most 100 items will be returned, process them ...
del myobject:0
hgetall myobject:1
... at most 100 items will be returned, process them ...
del myobject:1
...
You can therefore iterate chunk by chunk with a granularity which is determined by the output cardinality of the hash function.