sedgnu-sed

GNU sed and newlines with multiple scripts


Suppose we start with this string:

echo "1:apple:fruit.2:banana:fruit.3:cucumber:veggie.4:date:fruit.5:eggplant:veggie.">list.tmp

and want to end up with this result:

1-apple:fruit
2-banana:fruit
3-cucumber:veggie
4-date:fruit
5-eggplant:veggie



Why does this work:

sed -e 's/\./\n/g' -i list.tmp
sed -e 's/:/-/' list.tmp

But but not this:

sed -e 's/\./\n/g' -e 's/:/-/' list.tmp



The second command yields this, apparently ignoring the new newlines when looking for the first occurrence of ':' on each line.

1-apple:fruit
2:banana:fruit
3:cucumber:veggie
4:date:fruit
5:eggplant:veggie

With an extended version of the input:

echo "one:apple:fruit.two:banana:fruit.three:cucumber:veggie.four:date:fruit.five:eggplant:veggie.">list.tmp

I want to end up with this result:

one-apple:fruit
two-banana:fruit
three-cucumber:veggie
four-date:fruit
five-eggplant:veggie

Solution

  • Transferring key comment into an answer.

    Original data

    You forgot the g modifier on the second command in the double -e formulation. When the first -e completes, all the lines are still in the pattern space (the main working area in sed) — they do not become 5 separately read lines. You read one line; you're still processing it. Mind you, you'll need to use a modified pattern:

    s/\([0-9]\):/\1-/g
    

    Combining these, in GNU sed (as stipulated in the question title), you get:

    sed -e 's/\./\n/g' -e 's/\([0-9]\):/\1-/g' list.tmp
    

    Note that POSIX sed and other versions of sed have different rules about the newline substitution in the first -e expression.

    Consider using awk

    If changing tools from sed to awk is an option, you can do it more simply in awk, as shown by Ed Morton in a comment. Since that solution doesn't need to change to address the revised data, it clearly has advantages — the disadvantage is that it is not using sed. In 'the real world', you use the best tool for the job — and in this example, that's awk.

    Extended data

    With the 'extended' input, where there aren't convenient single digit numbers but you want to change the first colon on each line to a dash, you have to work harder:

    sed -e 's/\./\n/g' \
        -e  's/^\([^:]*\):/\1-/' \
        -e 's/\(\n[^:]*\):/\1-/g' \
        list.tmp
    

    You can flatten that all onto one line, but it is easier to see the similarities between the last two -e options if they're laid out on separate lines.

    You can also experiment with ERE (extended regular expressions) with the -E option, and group the two separate replacements into one:

    {
    echo "1:apple:fruit.2:banana:fruit.3:cucumber:veggie.4:date:fruit.5:eggplant:veggie."
    echo "one:apple:fruit.two:banana:fruit.three:cucumber:veggie.four:date:fruit.five:eggplant:veggie."
    } |
    sed -E -e 's/\./\
    /g' -e 's/((^|\n)[^:]+):/\1-/g'
    

    That yields:

    1-apple:fruit
    2-banana:fruit
    3-cucumber:veggie
    4-date:fruit
    5-eggplant:veggie
    
    one-apple:fruit
    two-banana:fruit
    three-cucumber:veggie
    four-date:fruit
    five-eggplant:veggie
    

    If you don't want the extra blank line, remove the final newline:

    {
    echo "1:apple:fruit.2:banana:fruit.3:cucumber:veggie.4:date:fruit.5:eggplant:veggie."
    echo "one:apple:fruit.two:banana:fruit.three:cucumber:veggie.four:date:fruit.five:eggplant:veggie."
    } |
    sed -E -e 's/\./\
    /g' -e 's/((^|\n)[^:]+):/\1-/g' -e 's/\n$//'
    

    The backslash-newline notation works correctly in both GNU sed and POSIX (including BSD and macOS) sed; you can re-replace that with \n in GNU sed. The \n in the replacement part of the s/// command doesn't work in BSD (macOS) sed. POSIX sed requires that you use a backslash to escape a literal newline in the replacement text:

    A line can be split by substituting a <newline> into it. The application shall escape the <newline> in the replacement by preceding it by a <backslash>.

    GNU sed is more flexible.

    Also (according to potong's answer), there is a GNU-specific modifier m that you can use to do the multi-line matching in one operation.