linuxsedtr

Unwrap paragraphs while deleting spaces


I've reached a frustrating end trying to figure out how to change the formatting of a file using sed, tr, etc. I'm sure there's some right answer to do what I want to do, I just don't know what it is.

Here's my problem. I have a data file that looks like this:

   8587  812.700  152.791  12.7410   0.0372  99.9999   9.9999  12.2520   0.0436  99.9999   9.9999  99.9999   9.9999  99.9999   9.9999
                           99.9999   9.9999  99.9999   9.9999  99.9999   9.9999  99.9999   9.9999  99.9999   9.9999  99.9999   9.9999
                           99.9999   9.9999  99.9999   9.9999  99.9999   9.9999  13.1942   0.0589  99.9999   9.9999  99.9999   9.9999
                           99.9999   9.9999  12.9601   0.1323  99.9999   9.9999   1.0337   0.3166

And I want to turn it into a single line. There are about 10,000 of these blocks in each file. I think that I want to cut out any newline characters followed by 26 spaces, which would do the job and leave the newline in place for the next block.

So, are there any handy linux tools available to do this?

Thanks


Solution

  • This cuts newline if its followed by 26 spaces:

    awk '{printf "%s",(/^                          /?$0:RS $0)}' file
    

    Would you also like to remove the 26 spaces?

    awk '{printf "%s",(/^                          /?$0:RS $0)}' file | awk '{gsub(/                          /,"")}1'
       8587  812.700  152.791  12.7410   0.0372  99.9999   9.9999  12.2520   0.0436  99.9999   9.9999  99.9999   9.9999  99.9999   9.9999 99.9999   9.9999  99.9999   9.9999  99.9999   9.9999  99.9999   9.9999  99.9999   9.9999  99.9999   9.9999 99.9999   9.9999  99.9999   9.9999  99.9999   9.9999  13.1942   0.0589  99.9999   9.9999  99.9999   9.9999 99.9999   9.9999  12.9601   0.1323  99.9999   9.9999   1.0337   0.3166
    

    Another example:
    Remove newline and 6 space, if next line starts with 6 spaces.

    cat file
    data here
          more data
          not here
    but this is new line
    so i this
    

    Here is all in one awk without giving blank line at the top, and correct ending.

    awk '{split($0,a,"     ")} NR==1 {a[2]=$0} {printf "%s",(/^      /||NR==1?a[2]:RS $0)}END{print ""}' file
    data here more data not here
    but this is new line
    so i this
    

    Rewritten code:

    awk '{printf "%s",(gsub(/ {5}/,"")||NR==1?$0:RS $0)} END {print ""}' file
    data here more data not here
    but this is new line
    so i this
    

    if {5} (number of spaces) does not work, try add --re-interval to your awk command, or just use the number of spaces you need.