shellredisredis-clusterredis-cli

How to generate a large amount of redis data of a specific size to a redis cluster


I need to generate millions of redis data with a value size of 1kb to a redis cluster, assuming that only the value type is string. I learned about two options, the first one is to use debug populate to generate a specific amount of data, but it does not set the value size.

127.0.0.1:6379> DEBUG POPULATE 1000000
OK

The second one is to use shell to call redis-cli and I don't know how to generate 1kb data

for i in `seq 1000000`; 
do 
    redis-cli SET key$i val$i ; 
done

I am newbie on this. How do I meet the demand? I really appreciate any help with this.


Try the solution based on Mark Setchell.

#!/bin/bash

# Generate around 32kB (+ around 33% base64 overhead) of random characters
stuff=$(head -c 32000 /dev/urandom | base64)

# Set 100,000 keys to 1kB strings, e.g. SET key32 A87H34..PHNQZ
for ((i=0;i<100;i++)) ; do
   echo SET key$i ${stuff:RANDOM:1024}
done | redis-cli -p 6371 -c --pipe

The following error occurs using the above code

sh fake_data_test.sh 
All data transferred. Waiting for the last reply...
MOVED 13252 172.20.0.33:6379
MOVED 9189 172.20.0.32:6379
ERR syntax error
ERR syntax error
MOVED 13120 172.20.0.33:6379
MOVED 9057 172.20.0.32:6379
ERR syntax error
ERR syntax error
...
ERR syntax error
Last reply received from server.
errors: 100, replies: 100

Then I thought whether it was a value formatting issue, so I put it in double quotes echo SET key$i "${stuff:RANDOM:1024}"

sh fake_data_test.sh 
All data transferred. Waiting for the last reply...
MOVED 13252 172.20.0.33:6379
ERR unknown command `kpshETtdvDBpL1BYimJl3FkpuJMom/heyj02qJwUGUCQvSZODHXHwNGodfVyIR6sWSv8agjlGMtl`, with args beginning with: 
...
ERR unknown command `UmBAaiwqgB25mSDhsK7qrveXhJV0cJCBRaz`, with args beginning with: 
MOVED 9189 172.20.0.32:6379
ERR unknown command ERR unknown command `gRolxGVLUVbnU5I/ykaXPCA+0Nev`, with args beginning with: 
Last reply received from server.
errors: 1397, replies: 1428
for ((i=0;i<100;i++)) ; do
   redis-cli -p 6371 -c SET key$i "${stuff:RANDOM:1024}"
done
// All output ok

I don't know if I'm using pipe in the wrong way

Note: OS is centos7. redis cluster creation via docker-compose. images is redis:4.0.11-alpine


Solution

  • Updated Answer

    If you are doing this in order to just generate test data, there's another much faster way. You could:

    So, essentially, you empty Redis and set it up how you want it (per my original answer) and back it up. Then, before each test, just replace the main database with the backup file and restart.

    Original Answer

    There are probably better ways, but (before my morning coffee) here's a method...

    First, generate 40kB of random text near the start of your script:

    stuff=$(head -c 40000 /dev/urandom | base64)
    

    Now, inside your loop, go to a random offset of 0..32767 in the text and take the following 1024 bytes:

    val=${stuff:RANDOM:1024}
    

    In case you wonder, I am trying to avoid expensive creation of processes inside your big loop. So the line val=${...} is a bash "internal" that doesn't create a new process.

    Note that if you take a million random samples starting at offsets 0..32768, there will inevitably be repetitions. You could reduce this by taking multiple smaller chunks from different offsets and appending them together. Or perhaps, generate absolutely unique values by prefixing each value with a sequential number and making the strings slightly over 1024 bytes.


    Aside, I think you'd be better pipelining some of this, or using Python or some bulk-loading to speed it up.

    This code does 100,000 insertions of 1024 byte strings in around 49 seconds for example:

    #!/bin/bash
    
    # Generate around 32kB (+ around 33% base64 overhead) of random characters
    stuff=$(head -c 32000 /dev/urandom | base64)
    
    # Set 100,000 keys to 1kB strings, e.g. SET key32 A87H34..PHNQZ
    for ((i=0;i<100000;i++)) ; do
       echo SET key$i ${stuff:RANDOM:1024}
    done | redis-cli --pipe
    

    If you want to ensure the values are unique, and don't mind making each value just over 1024 bytes, replace the line in the loop with:

    echo SET key$i "${i}-${stuff:RANDOM:1024}"
    

    If you require exactly 1024 unique bytes, you can use the following at a 10% time penalty:

    # Generate value: 8 digits of sequence number, a dash and 1015 random characters
    printf -v val "%08d-%s" $i ${stuff:RANDOM:1015}
    echo SET key$i $val