bashunixawk

Paste 4th column of separate files side by side in one file


I want to loop through multiple subdirectories and parse a specific file (DOE.nas) that exists in all subdirectories. The parsing is to print the 4th columns into one output, side by side. Below is an example for just 2 subfolders.

(DOE.nas) in dir_1
PSHELL       217  136738     0.7  136738            
PSHELL      1786   13571     1.4   13571         
PSHELL      1605  136513    0.65  136513           
PSHELL      1623   13571     1.4   13571
(DOE.nas) in dir_2
PSHELL      1628  136733     1.3  136733       
PSHELL      2015  136514    0.75  136514        
PSHELL      2304  136513     1.5  136513         
PSHELL      2509   13571     1.2   13571
Desired output (out.txt)
 0.7   1.3
 1.4  0.75
0.65   1.5
 1.4   1.2

Here is my code and the error:

for FILE in */*.nas; do
    awk -v OFS='\t' '{print $4}' "$FILE" > tmp
    paste tmp >> out.txt
done
My out.txt is stacked up
 0.7
 1.4
0.65
 1.4
 1.3
0.75
 1.5
 1.2

Solution

  • Addressing OP's comment about having to process more than 2 files ...

    Assumptions:

    Setup:

    $ head dir_*/DOE.nas
    ==> dir_1/DOE.nas <==
    PSHELL       217  136738     0.7  136738
    PSHELL      1786   13571     1.4   13571
    PSHELL      1605  136513    0.65  136513
    PSHELL      1623   13571     1.4   13571
    
    ==> dir_2/DOE.nas <==
    PSHELL      1628  136733     1.3  136733
    PSHELL      2015  136514    0.75  136514
    PSHELL      2304  136513     1.5  136513
    PSHELL      2509   13571     1.2   13571
    
    ==> dir_3/DOE.nas <==
    PSHELL      1628  136733     17.3  136733
    PSHELL      2015  136514    20.75  136514
    PSHELL      2304  136513     31.5  136513
    PSHELL      2509   13571     16.2   13571
    
    ==> dir_4/DOE.nas <==
    PSHELL      1628  136733     7.54 136733
    PSHELL      2015  136514    12.3  136514
    PSHELL      2304  136513    24.55 136513
    PSHELL      2509   13571     -1.2   13571
    

    One awk idea:

    awk '
    BEGIN { OFS="\t" }
          { lines[FNR]= lines[FNR] (FNR==NR ? "" : OFS) $4 }
    END   { for (i=1; i<=FNR; i++)
                print lines[i]
          }
    ' dir_*/DOE.nas
    

    NOTE: this replaces all of OP's current code (for / awk / paste)

    This generates:

    0.7     1.3     17.3    7.54
    1.4     0.75    20.75   12.3
    0.65    1.5     31.5    24.55
    1.4     1.2     16.2    -1.2