sed

Is sed with multiple expressions the same as one expression with semi-colons?


I have a very large file with multiple sed commands to run on it, and I want to avoid out-of-memory errors and save time. Are these all equivalent?

  1. sed -e 'expr1' -e 'expr2' -e 'expr3' file
  2. sed 'expr1;expr2;expr3' file
  3. sed expr1 file | sed expr2 | sed expr3

My guess is that with piping in (3), the stream is processed separately each time, so it would take 3x as long as compared to (2) where it is only processed once). But I am not sure how sed internally processes (1).


Solution

  • Firstly, sed -e 'expr1' -e 'expr2' file is exactly the same as sed 'expr1;expr2' file. Also equivalent are

    sed 'expr1
    expr2' file
    

    and storing

    expr1
    expr2
    

    (or expr1;expr2) in a file, e.g., sedscr and calling it with sed -f sedscr file, or finally storing

    #!/usr/bin/sed -f
        
    expr1
    expr2
    

    in a file sedscr and calling it with ./sedscr file.

    For each input line, sed goes through the complete script and applies all commands to it, then goes to the next input line.

    Piping sed calls, on the other hand, has sed go through the whole file each time (and creates a subshell for each call). This might not make a big difference if you do an operation on every line, but imagine a chain of substitutions that depend on each other, like for a file

    xx
    xx
    pattern
    xx
    xx
    PATTERN
    xx
    xx
    

    and you want, in a case insensitive manner, end up with uppercase PATTERN in parentheses wherever you find it. If you use pipes as in

    sed 's/pattern/PATTERN/' infile | sed 's/PATTERN/(&)/'
    

    you go through the file twice for three operations in total:

    Initial  1st pass 2nd pass
    xx       xx       xx
    xx       xx       xx
    pattern  PATTERN  (PATTERN)
    xx       xx       xx
    xx       xx       xx
    PATTERN  PATTERN  (PATTERN)
    xx       xx       xx
    xx       xx       xx
    

    but with

    sed 's/pattern/PATTERN/;s/PATTERN/(&)/' infile
    

    you get the same result in just one pass. So, by all means, try and cram everything into a single command.

    GNU sed can do it in a single command: sed 's/pattern/\U(&)/' infile.