I have a very large file with multiple sed commands to run on it, and I want to avoid out-of-memory errors and save time. Are these all equivalent?
sed -e 'expr1' -e 'expr2' -e 'expr3' file
sed 'expr1;expr2;expr3' file
sed expr1 file | sed expr2 | sed expr3
My guess is that with piping in (3), the stream is processed separately each time, so it would take 3x as long as compared to (2) where it is only processed once). But I am not sure how sed internally processes (1).
Firstly, sed -e 'expr1' -e 'expr2' file
is exactly the same as sed 'expr1;expr2' file
. Also equivalent are
sed 'expr1
expr2' file
and storing
expr1
expr2
(or expr1;expr2
) in a file, e.g., sedscr
and calling it with sed -f sedscr file
, or finally storing
#!/usr/bin/sed -f
expr1
expr2
in a file sedscr
and calling it with ./sedscr file
.
For each input line, sed goes through the complete script and applies all commands to it, then goes to the next input line.
Piping sed calls, on the other hand, has sed go through the whole file each time (and creates a subshell for each call). This might not make a big difference if you do an operation on every line, but imagine a chain of substitutions that depend on each other, like for a file
xx
xx
pattern
xx
xx
PATTERN
xx
xx
and you want, in a case insensitive manner, end up with uppercase PATTERN
in parentheses wherever you find it. If you use pipes as in
sed 's/pattern/PATTERN/' infile | sed 's/PATTERN/(&)/'
you go through the file twice for three operations in total:
Initial 1st pass 2nd pass
xx xx xx
xx xx xx
pattern PATTERN (PATTERN)
xx xx xx
xx xx xx
PATTERN PATTERN (PATTERN)
xx xx xx
xx xx xx
but with
sed 's/pattern/PATTERN/;s/PATTERN/(&)/' infile
you get the same result in just one pass. So, by all means, try and cram everything into a single command.
GNU sed can do it in a single command: sed 's/pattern/\U(&)/' infile
.