linuxbashsed

How do I fix sed commands becoming extremely slow when load is high?


I have a bash script that takes a simple properties file and substitutes the values into another file. (Property file is just lines of 'foo=bar' type properties)

INPUT=`cat $INPUT_FILE`
while read line; do
   PROP_NAME=`echo $line | cut -f1 -d'='`
   PROP_VALUE=`echo $line | cut -f2- -d'=' | sed 's/\$/\\\$/g`
   time INPUT="$(echo "$INPUT" | sed "s\`${PROP_NAME}\b\`${PROP_VALUE}\`g")"
done <<<$(cat "$PROPERTIES_FILE")
# Do more stuff with INPUT

However, when my machine has high load (upper forties) I get a large time loss on my seds

real  0m0.169s
user  0m0.001s
sys  0m0.006s

Low load:

real  0m0.011s
user  0m0.002s
sys  0m0.004s

Normally losing 0.1 seconds isn't a huge deal but both the properties file and the input files are hundreds/thousands of lines long and those .1 seconds add up to over an hour of wasted time.

What can I do to fix this? Do I just need more CPUs?

Sample properties (lines start with special char to create a way to indicate that something in the input is trying to access a property)

$foo=bar
$hello=world
^hello=goodbye

Sample input

This is a story about $hello. It starts at a $foo and ends in a park.

Bob said to Sally "^hello, see you soon"

Expected result

This is a story about world. It starts at a bar and ends in a park.

Bob said to Sally "goodbye, see you soon"

Solution

  • One idea/approach using bash and sed , you could try something like:

    #!/usr/bin/env bash
    
    while IFS='=' read -r prop_name prop_value; do
      if [[ "$prop_name" == "^"* ]]; then
         prop_name="\\${prop_name}"
      fi
      input_value+=("s/${prop_name}\\b/${prop_value}/g")
    done < properties.txt
    
    sed_input="$(IFS=';'; printf '%s' "${input_value[*]}")"
    
    sed "$sed_input" sample_input.txt
    

    One way to check the value of sed_input is

    declare -p sed_input
    

    Or

    printf '%s\n' "$sed_input"