I have a bash script that takes a simple properties file and substitutes the values into another file. (Property file is just lines of 'foo=bar' type properties)
INPUT=`cat $INPUT_FILE`
while read line; do
PROP_NAME=`echo $line | cut -f1 -d'='`
PROP_VALUE=`echo $line | cut -f2- -d'=' | sed 's/\$/\\\$/g`
time INPUT="$(echo "$INPUT" | sed "s\`${PROP_NAME}\b\`${PROP_VALUE}\`g")"
done <<<$(cat "$PROPERTIES_FILE")
# Do more stuff with INPUT
However, when my machine has high load (upper forties) I get a large time loss on my seds
real 0m0.169s
user 0m0.001s
sys 0m0.006s
Low load:
real 0m0.011s
user 0m0.002s
sys 0m0.004s
Normally losing 0.1 seconds isn't a huge deal but both the properties file and the input files are hundreds/thousands of lines long and those .1 seconds add up to over an hour of wasted time.
What can I do to fix this? Do I just need more CPUs?
Sample properties (lines start with special char to create a way to indicate that something in the input is trying to access a property)
$foo=bar
$hello=world
^hello=goodbye
Sample input
This is a story about $hello. It starts at a $foo and ends in a park.
Bob said to Sally "^hello, see you soon"
Expected result
This is a story about world. It starts at a bar and ends in a park.
Bob said to Sally "goodbye, see you soon"
This will produce the output you show from the input you show, using any awk:
$ cat tst.sh
#!/usr/bin/env bash
awk '
NR == FNR {
pos = index($0, "=")
tag = substr($0, 1, pos - 1)
val = substr($0, pos + 1)
# Make any regexp metachars in the tag literal
gsub(/[^^\\[:alnum:]]/, "[&]", tag)
gsub(/\\/, "&&", tag)
gsub(/\^/, "\\\\&", tag)
tags2vals[tag] = val
next
}
{
for ( tag in tags2vals ) {
if ( match($0, tag) ) {
val = tags2vals[tag]
$0 = substr($0, 1, RSTART-1) val substr($0, RSTART+RLENGTH)
}
}
print
}
' props input
$ ./tst.sh
This is a story about world. It starts at a bar and ends in a park.
Bob said to Sally "goodbye, see you soon"
That was run against the sample input you provided:
$ head props input
==> props <==
$foo=bar
$hello=world
^hello=goodbye
==> input <==
This is a story about $hello. It starts at a $foo and ends in a park.
Bob said to Sally "^hello, see you soon"
but if your real input can contain recursive property definitions ($foo=$hello
) and/or substrings in the input (this is $foobar here
) you do not want to match then you'd need to enhance it to handle those however you want them handled.
See Is it possible to escape regex metacharacters reliably with sed (it's a sed question but the issue of escaping regexp metachars applies to awk too) for what the gsub()
s are doing in the script.