replacesedcmd

remove sequence of line feeds and spaces in file with sed


I have a file which contains an undesired sequence of line feeds and spaces that I want to remove. The actual file is about 1 million rows, this is just to provide a reproducible example.

I can grep the offending lines like this:

grep -ciP "\n\n {6,}" problem.rpt

And it correctly returns

## 3

So I tried with sed to replace the string:

sed "s/\n\n {6,}//g" problem.rpt > prob2.rpt

but instead of deleting the sequence "\n\n {6,}" I now have "\r\n\r\n {6,}" (it introduced a CR before each LF, without removing it or the 6+ spaces).

I'm working with GNU sed and grep in a windows 8.1 cmd.

What am I doing wrong, and what's the right way to approach this job?


Solution

  • From a list of sed one-liners I found one command that solved my problem:

    sed -e :a -e "$!N; s/\n //;ta" -e "P;D" problem.rpt > prob2.rpt
    

    Then, trying to decipher the command, this is what I found here (copied verbatim):

    sed ':a;  $!N;  s/\n/string/;  ta'
         ---  ----  -------------  --
          |     |        |          |--> go back (`t`) to `a`
          |     |        |-------------> substitute newlines with `string`
          |     |----------------------> If this is not the last line (`$!`), append the 
          |                              next line to the pattern space.
          |----------------------------> Create the label `a`.
    

    I still don't know what the P;D part does, I'd appreciate if someone with the knowledge edits this answer to add it.