I'm looking for a linux command-line solution to the problem:
"Replace each single linefeed with a space, but don't modify any groups of consecutive linefeeds i.e. do not modify any linefeed which has another linefeed next to it." As an example:
one two
three four
five six
seven eight
nine ten
should become:
one two three four five six
seven eight nine ten
I am already aware that every valid text file should end with a linefeed, but if your proposed solution deletes that final-character-linefeed, that would not be a problem (it would be easy for me to append it back on afterwards).
I think that this is "too complex a task" for tr
, but I assume something should be possible in sed
or awk
(if not, then I'll need to "rustle up" something in python
or c
). Unfortunately, my sed
-fu is weak (as is my awk
-fu) - are there any sed
/awk
black-belts around that could please help me?
I have already found How can I replace each newline (\n) with a space using sed? but of course the suggested answers to that question wipe out my "multiple consecutive linefeeds" (which I want to preserve).
I am also aware that "Sed is line-based therefore it is hard for it to grasp newlines" - perhaps sed
is not the best tool for this job.
I have also found Replace only single instance of a character using sed but of course the character being replaced in that question is not a (problematic) linefeed.
(Why do I want this? The nano
editor has a justify
function which adds and removes single linefeeds so that any line "fills" the chosen line length but does not overrun it. nano
does have a "built-in" unjustify
function, but this is really just an "undo", not a "real" unjustify
. What I am trying to find is the closest thing to a "genuine" unjustify
command.)
Update: all the current solutions work perfectly, and thank you to all those who provided them. I've accepted Ed Morton's for the reasons that he gives - his processes only 1 line of input at a time, and it's portable to a non-gnu version of its tool. The solution to my nano
problem is:
cat << 'EOF' > $HOME/.local/bin/dejustify
#!/bin/sh
awk -v RS= 'NR>1{print ""} {$1=$1} 1' < "${1:-/dev/stdin}"
EOF
chmod u+x $HOME/.local/bin/dejustify
(I found the < "${1:-/dev/stdin}"
here.)
I can now use it in a pipeline (printf "one\ntwo\nthree\nfour\n" | dejustify
) or just dejustify <filename>
.
Inside nano
, I can <Ctrl>+<t>
then enter |dejustify
to dejustify my text. Success! 🙏
Using any awk:
$ awk -v RS= -v ORS='\n\n' -F'\n' '{$1=$1} 1' file
one two three four five six
seven eight nine ten
Breaking it down:
-v RS=
treat the input as [possibly multi-line] records separated by 1 or more empty lines.-v ORS='\n\n'
put 2 newlines at the end of each output record.-F'\n'
set the field separator to a newline so that ONLY newlines get replaced in the next step, otherwise all chains of contiguous white space within each record would be replaced.{$1=$1}
update the value of a field, $1
, thereby causing awk to rebuild the current record replacing all strings that match the FS
(a newline) with an OFS
(a blank char).1
a true condition causing awk to execute it's default action of printing the current record.The above will print a blank line at the end of the output, if that's a problem you can always do this instead:
$ awk -v RS= -F'\n' 'NR>1{print ""} {$1=$1} 1' file
one two three four five six
seven eight nine ten
which prints a blank line before each record except the first instead of printing a blank line after every record:
NR>1{print ""}
if this is the second or subsequent record then print a blank line before it.