From inside of awk, I want to generate a string of X alphanumeric characters reasonably random (i.e., random but not cryptographic) on demand and rapidly.
In Ruby, I could do this:
ruby -e '
def rand_string(len, min=48, max=123, pattern=/[[:alnum:]]/)
rtr=""
while rtr.length<len do
rtr+=(0..len).map { (min + rand(max-min)).chr }.
select{|e| e[pattern] }.join
end # falls out when min length achieved
rtr[0...len]
end
(0..5).each{|_| puts rand_string(20)}'
Prints:
61ihPbceigvQ2nFv8s7f
JiL0Lucw6IJl87rLQgEm
lKEjaTi9jSVWFF1V6Zyn
T3jKdEuAnMeaNUl85ABF
3ct0OBbHpAp72AtKtLCk
wmNqCK3lWz74vk2Zme01
For a time comparison, that Ruby can produce 1,000,000 unique strings (no duplicates) in roughly 9 seconds.
Taking that, I tried in awk:
awk -v r=$RANDOM '
# the r value will only be a new seed each invocation -- not each f call
function rand_string(i) {
s=""
min=48
max=123
srand(r)
while (length(s)<i) {
c=sprintf("%c", int(min+rand()*(max-min+1)))
if (c~/[[:alnum:]]/) s=s c
}
return s
}
BEGIN{ for (i=1; i<=5; i++) {print rand_string(20)}}'
That does not work -- same seed, same string result. Prints:
D65CsI55zTsk5otzSoJI
D65CsI55zTsk5otzSoJI
D65CsI55zTsk5otzSoJI
D65CsI55zTsk5otzSoJI
D65CsI55zTsk5otzSoJI
Now try reading /dev/urandom
with od
:
awk '
function rand_string(i) {
arg=i*4
cmd="od -A n -t u1 -N " arg " /dev/urandom" # this is POSIX
# ^ ^ unsigned character
# ^ ^ count of i*4 bytes
s=""
min=48
max=123
while (length(s)<i) {
while((cmd | getline line)>0) {
split(line, la)
for (e in la) {
if (la[e]<min || la[e]>max) continue
c=sprintf("%c", la[e])
if (c~/[[:alnum:]]/) s=s c
}
}
close(cmd)
}
return substr(s,1,i)
}
BEGIN {for(i=1;i<=5;i++) print rand_string(20) }'
This works as desired. Prints:
sYY195x6fFQdYMrOn1OS
9mv7KwtgdUu2DgslQByo
LyVvVauEBZU2Ad6kVY9q
WFsJXvw8YWYmySIP87Nz
AMcZY2hKNzBhN1ByX7LW
But now the problem is with the pipe od -A n -t u1 -N " arg " /dev/urandom
is is really slow -- unusable except for a trivial number of strings.
Any idea how I can modify one of those awks so that it:
This question has been asked a few times:
srand
;I don't have access to Ruby but on my (apparently slow!) system the awk script from @dawgs answer takes 24 seconds to run while this one takes 5 seconds:
$ cat tst.sh
#!/usr/bin/env bash
time awk -v r=$RANDOM '
function rand_string(n, s,i) {
for ( i=1; i<=n; i++ ) {
s = s chars[int(1+rand()*numChars)]
}
return s
}
BEGIN{
srand(r) # Use srand ONCE only
for (i=48; i<=122; i++) {
char = sprintf("%c", i)
if ( char ~ /[[:alnum:]]/ ) {
chars[++numChars] = char
}
}
for (i=1; i<=1000000; i++) {print rand_string(20)}
}' | sort | uniq -c | awk '$1>1'
$ ./tst.sh
real 0m5.078s
user 0m4.077s
sys 0m0.045s
so if you want to produce a lot of strings then create an array of the possible letters first and then index the array using rand()
instead of calling sprintf()
for every letter of every string.
Since making a variable like s
iteratively larger is slow in terms of memory [re]allocation, you can make the script about 20% faster still by setting OFS=""
then setting $i
to each char rather than building up a string:
function rand_string(n, i) {
for ( i=1; i<=n; i++ ) {
$i = chars[int(1+rand()*numChars)]
}
return $0
}
$ ./tst2.sh
real 0m3.954s
user 0m3.420s
sys 0m0.015s
as long as you don't need $0
for anything else.