I have a file which contains an undesired sequence of line feeds and spaces that I want to remove. The actual file is about 1 million rows, this is just to provide a reproducible example.
I can grep
the offending lines like this:
grep -ciP "\n\n {6,}" problem.rpt
And it correctly returns
## 3
So I tried with sed
to replace the string:
sed "s/\n\n {6,}//g" problem.rpt > prob2.rpt
but instead of deleting the sequence "\n\n {6,}"
I now have "\r\n\r\n {6,}"
(it introduced a CR before each LF, without removing it or the 6+ spaces).
I'm working with GNU sed
and grep
in a windows 8.1 cmd
.
What am I doing wrong, and what's the right way to approach this job?
From a list of sed
one-liners I found one command that solved my problem:
sed -e :a -e "$!N; s/\n //;ta" -e "P;D" problem.rpt > prob2.rpt
Then, trying to decipher the command, this is what I found here (copied verbatim):
sed ':a; $!N; s/\n/string/; ta'
--- ---- ------------- --
| | | |--> go back (`t`) to `a`
| | |-------------> substitute newlines with `string`
| |----------------------> If this is not the last line (`$!`), append the
| next line to the pattern space.
|----------------------------> Create the label `a`.
I still don't know what the P;D
part does, I'd appreciate if someone with the knowledge edits this answer to add it.