I want to generate a high amount of random numbers. I wrote the following bash command (note that I am using cat
here for demonstrational purposes; in my real use case, I am piping the numbers into a process):
for i in {1..99999999}; do echo -e "$(cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 1)"; done | cat
The numbers are printed at a very low rate. However, if I generate a smaller amount, it is much faster:
for i in {1..9999}; do echo -e "$(cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 1)"; done | cat
Note that the only difference is 9999
instead of 99999999
.
Why is this? Is the data buffered somewhere? Is there a way to optimize this, so that the random numbers are piped/streamed into cat
immediately?
Why is this?
Generating {1..99999999}
100000000 arguments and then parsing them requires a lot of memory allocation from bash. This significantly stalls the whole system.
Additionally, large chunks of data are read from /dev/urandom
, and about 96% of that data are filtered out by tr -dc '0-9'
. This significantly depletes the entropy pool and additionally stalls the whole system.
Is the data buffered somewhere?
Each process has its own buffer, so:
cat /dev/urandom
is bufferingtr -dc '0-9'
is bufferingfold -w 5
is bufferinghead -n 1
is buffering| cat
has its own bufferThat's 6 buffering places. Even ignoring input buffering from head -n1
and from the right side of the pipeline | cat
, that's 4 output buffers.
Also, save animals and stop cat abuse. Use tr </dev/urandom
, instead of cat /dev/urandom | tr
. Fun fact - tr
can't take filename as a argument.
Is there a way to optimize this, so that the random numbers are piped/streamed into cat immediately?
Remove the whole code.
Take only as little bytes from the random source as you need. To generate a 32-bit number you only need 32 bits - no more. To generate a 5-digit number, you only need 17 bits - rounding to 8-bit bytes, that's only 3 bytes. The tr -dc '0-9'
is a cool trick, but it definitely shouldn't be used in any real code.
Strangely recently I answered I guess a similar question, copying the code from there, you could:
for ((i=0;i<100000000;++i)); do echo "$((0x$(dd if=/dev/urandom of=/dev/stdout bs=4 count=1 status=none | xxd -p)))"; done | cut -c-5
# cut to take first 5 digits
But that still would be unacceptably slow, as it runs 2 processes for each random number (and I think just taking the first 5 digits will have a bad distribution).
I suggest to use $RANDOM
, available in bash. If not, use $SRANDOM
if you really want /dev/urandom
(and really know why you want it). If not, I suggest to write the random number generation from /dev/urandom
in a real programming language, like C, C++, python, perl, ruby. I believe one could write it in awk
.
The following looks nice, but still converting binary data to hex, just to convert them to decimal later is a workaround for that shell just can't work with binary data:
count=10;
# take count*4 bytes from input
dd if=/dev/urandom of=/dev/stdout bs=4 count=$count status=none |
# Convert bytes to hex 4 bytes at a time
xxd -p -c 4 |
# Convert hex to decimal using GNU awk
awk --non-decimal-data '{printf "%d\n", "0x"$0}'