I have a bash script that takes a simple properties file and substitutes the values into another file. (Property file is just lines of 'foo=bar' type properties)
INPUT=`cat $INPUT_FILE`
while read line; do
PROP_NAME=`echo $line | cut -f1 -d'='`
PROP_VALUE=`echo $line | cut -f2- -d'=' | sed 's/\$/\\\$/g`
time INPUT="$(echo "$INPUT" | sed "s\`${PROP_NAME}\b\`${PROP_VALUE}\`g")"
done <<<$(cat "$PROPERTIES_FILE")
# Do more stuff with INPUT
However, when my machine has high load (upper forties) I get a large time loss on my seds
real 0m0.169s
user 0m0.001s
sys 0m0.006s
Low load:
real 0m0.011s
user 0m0.002s
sys 0m0.004s
Normally losing 0.1 seconds isn't a huge deal but both the properties file and the input files are hundreds/thousands of lines long and those .1 seconds add up to over an hour of wasted time.
What can I do to fix this? Do I just need more CPUs?
Sample properties (lines start with special char to create a way to indicate that something in the input is trying to access a property)
$foo=bar
$hello=world
^hello=goodbye
Sample input
This is a story about $hello. It starts at a $foo and ends in a park.
Bob said to Sally "^hello, see you soon"
Expected result
This is a story about world. It starts at a bar and ends in a park.
Bob said to Sally "goodbye, see you soon"
One idea/approach using bash
and sed
, you could try something like:
#!/usr/bin/env bash
while IFS='=' read -r prop_name prop_value; do
if [[ "$prop_name" == "^"* ]]; then
prop_name="\\${prop_name}"
fi
input_value+=("s/${prop_name}\\b/${prop_value}/g")
done < properties.txt
sed_input="$(IFS=';'; printf '%s' "${input_value[*]}")"
sed "$sed_input" sample_input.txt
One way to check the value of sed_input
is
declare -p sed_input
Or
printf '%s\n' "$sed_input"
Embedding an external utility from bash within a shell loop like cut
and sed
should be avoided. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice
The sed
invocation above run only once even if the file that needs to be edited has 500+ lines.
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?