Suppose we start with this string:
echo "1:apple:fruit.2:banana:fruit.3:cucumber:veggie.4:date:fruit.5:eggplant:veggie.">list.tmp
and want to end up with this result:
1-apple:fruit
2-banana:fruit
3-cucumber:veggie
4-date:fruit
5-eggplant:veggie
Why does this work:
sed -e 's/\./\n/g' -i list.tmp
sed -e 's/:/-/' list.tmp
But but not this:
sed -e 's/\./\n/g' -e 's/:/-/' list.tmp
The second command yields this, apparently ignoring the new newlines when looking for the first occurrence of ':' on each line.
1-apple:fruit
2:banana:fruit
3:cucumber:veggie
4:date:fruit
5:eggplant:veggie
With an extended version of the input:
echo "one:apple:fruit.two:banana:fruit.three:cucumber:veggie.four:date:fruit.five:eggplant:veggie.">list.tmp
I want to end up with this result:
one-apple:fruit
two-banana:fruit
three-cucumber:veggie
four-date:fruit
five-eggplant:veggie
Transferring key comment into an answer.
You forgot the g
modifier on the second command in the double -e
formulation. When the first -e
completes, all the lines are still in the pattern space (the main working area in sed) — they do not become 5 separately read lines. You read one line; you're still processing it. Mind you, you'll need to use a modified pattern:
s/\([0-9]\):/\1-/g
Combining these, in GNU sed
(as stipulated in the question title), you get:
sed -e 's/\./\n/g' -e 's/\([0-9]\):/\1-/g' list.tmp
Note that POSIX sed
and other versions of sed
have different rules about the newline substitution in the first -e
expression.
awk
If changing tools from sed
to awk
is an option, you can do it more simply in awk
, as shown by Ed Morton in a comment. Since that solution doesn't need to change to address the revised data, it clearly has advantages — the disadvantage is that it is not using sed
. In 'the real world', you use the best tool for the job — and in this example, that's awk
.
With the 'extended' input, where there aren't convenient single digit numbers but you want to change the first colon on each line to a dash, you have to work harder:
sed -e 's/\./\n/g' \
-e 's/^\([^:]*\):/\1-/' \
-e 's/\(\n[^:]*\):/\1-/g' \
list.tmp
-e
in unchanged.g
modifier is irrelevant here.-e
looks for a newline followed by a sequence of non-colons followed by a colon, and replaces it with the newline, the non-colon sequence and a dash. The g
modifier is very relevant here.You can flatten that all onto one line, but it is easier to see the similarities between the last two -e
options if they're laid out on separate lines.
You can also experiment with ERE (extended regular expressions) with the -E
option, and group the two separate replacements into one:
{
echo "1:apple:fruit.2:banana:fruit.3:cucumber:veggie.4:date:fruit.5:eggplant:veggie."
echo "one:apple:fruit.two:banana:fruit.three:cucumber:veggie.four:date:fruit.five:eggplant:veggie."
} |
sed -E -e 's/\./\
/g' -e 's/((^|\n)[^:]+):/\1-/g'
That yields:
1-apple:fruit
2-banana:fruit
3-cucumber:veggie
4-date:fruit
5-eggplant:veggie
one-apple:fruit
two-banana:fruit
three-cucumber:veggie
four-date:fruit
five-eggplant:veggie
If you don't want the extra blank line, remove the final newline:
{
echo "1:apple:fruit.2:banana:fruit.3:cucumber:veggie.4:date:fruit.5:eggplant:veggie."
echo "one:apple:fruit.two:banana:fruit.three:cucumber:veggie.four:date:fruit.five:eggplant:veggie."
} |
sed -E -e 's/\./\
/g' -e 's/((^|\n)[^:]+):/\1-/g' -e 's/\n$//'
The backslash-newline notation works correctly in both GNU sed
and POSIX (including BSD and macOS) sed
; you can re-replace that with \n
in GNU sed
.
The \n
in the replacement part of the s///
command doesn't work in BSD (macOS) sed
. POSIX sed
requires that you use a backslash to escape a literal newline in the replacement text:
A line can be split by substituting a
<newline>
into it. The application shall escape the<newline>
in the replacement by preceding it by a<backslash>
.
GNU sed is more flexible.
Also (according to potong's answer), there is a GNU-specific modifier m
that you can use to do the multi-line matching in one operation.