This question is based on How do I fix sed commands becoming extremely slow when load is high? with the advice of @markp-fuso, @jhnc, and @Jetchisel to avoid a chameleon question as many of the answers used hashing and maps for optimization.
I have the following bash script
INPUT=`cat $INPUT_FILE`
while read line; do
PROP_NAME="$(echo $line | cut -f1 -d'=')"
PROP_VALUE="$(echo $line | cut -f2- -d'=' | sed 's/\$/\\\$/g' | sed 's/\&/\\\&/g')"
INPUT="$(echo "$INPUT" | sed "s\`${PROP_NAME}\b\`${PROP_VALUE}\`g")"
done < "$PROPERTIES_FILE"
echo "$INPUT"
This script takes a properties file with format that supports recursion and special characters:
$foo=$barname bar
$barname=Tom&Jerry
$hello=world
And uses it to substitute into a text file with no set format. So
I went to the $foo and said hello to the $hello and they fined me $5.
becomes
I went to the Tom&Jerry bar and said hello to the world and they fined me $5.
The properties file and the text file are hundreds of lines long so performance is important, and the naive implementation results in many minutes or even over an hour of processing time depending on system load. Also important to note is use of \b in the sed, which means that all references are terminated with a punctuation mark or whitespace.
The script cannot do infinite recursion because it only makes one pass through the properties file, which also causes order of properties to matter when recursion is being used.
From the comments on the question, it seems you could preprocess the property file to expand property keys that appear in values.
As noted in my answer to the previous question, the original code takes O(m.n) time - m
properties looked for in text of size n
. Preprocessing the properties and making use of Perl's ability to search literal string alternations in constant time can drop this to O(m+n) - one pass over the properties and one pass over the text:
perl -e '
# load properties from first file
while ( ($k,$v) = split "=",<<>>,2 ) {
chomp $v;
$k2v{$k} = $v; # hash for value lookup
unshift @propkeys, $k; # array for insertion order
last if eof;
}
# build single regex from all keys
# \Q escapes regex metacharacters
$re = join "|", map qr/\Q$_\E/, @propkeys;
# walk properties (in reverse), expanding values as we go
for $k (@propkeys) {
$k2v{$k} =~ s/($re)\b/ $seen{$1} ? $k2v{$1} : $1 /ge;
$seen{$k} = 1;
}
# load input from second file
undef $/;
$_ = <<>>;
# convert all properties simultaneously
s/($re)\b/ $k2v{$1} /ge;
# output the result
print;
' propfile textfile
The method I use for preprocessing produces property values that match the behaviour of the question code where order of listing properties affects result:
$key1=foo_$key2
$key2=bar
(expands $key1
in textfile to foo_bar
)
vs
$key2=bar
$key1=foo_$key2
(expands $key1
in textfile to foo_$key2
)
To expand property values until they no longer contain any keys, the code can become:
perl -e '
while ( ($k,$v) = split "=",<<>>,2 ) {
chomp $v;
$k2v{$k} = $v;
unshift @propkeys, $k;
last if eof;
}
$re = join "|", map qr/\Q$_\E/, @propkeys;
# loop trying to expand value of each property key in list
# remove key from list once its value contains no key references
while (@propkeys = grep $k2v{$_} =~ s/($re)\b/ $k2v{$1} /ge, @propkeys) {
die "recursion depth exceeded (@propkeys)\n" if ++$ct > 10;
}
undef $/;
$_ = <<>>;
s/($re)\b/ $k2v{$1} /ge;
print;
' propfile textfile
Depending on details not provided in the question, it may be possible and useful to cache or memoise the expanded properties for later reuse without having to recompute the values each time.