awkcompareuniq

Compare two files with same ids and substract fields


I have 2 files with the format of

File A

01  20200111  28.56  22.07  40.14  49.79  22.81  49.31  33.75  31.24  39.41  36.18
02  20200118  32.41  14.89  38.82  60.54  11.54  49.10  34.34  25.53  36.96  34.30
03  20200125  21.95  18.48  28.45  42.94  22.45  42.40  36.43  34.53  43.51  33.28
...
01  20210109  29.32  24.60  34.41  46.56  29.38  44.06  34.47  33.75  41.12  36.04
02  20210116  29.38  28.39  33.07  42.63  29.46  39.41  32.45  33.60  43.81  34.93
03  20210123  27.51  19.55  32.98  45.88  26.05  46.88  37.58  31.98  44.52  35.90


File B

01  19912020  24.20  16.70  31.28  45.98  19.26  42.57  33.43  29.35  39.96  32.22
02  19912020  24.29  15.46  29.02  44.47  19.24  40.57  32.98  29.80  40.26  31.36
03  19912020  20.69  14.07  29.90  45.89  21.34  44.71  35.90  31.61  42.00  33.08

I would like to take the difference of Columns 3-12 of each id (Column 1) in File A from the same unique id in File B. File A has multiple instances of the same id. So each instance would be compared to the same id in File B.

My desired output is:

01   20200111    4.36    5.37    8.86    3.81    3.55    6.74    0.32    1.89   -0.55    3.96
02   20200118    8.12   -0.57     9.8   16.07    -7.7    8.53    1.36   -4.27    -3.3    2.94
03   20200125    1.26    4.41   -1.45   -2.95    1.11   -2.31    0.53    2.92    1.51     0.2
...
01   20210109    5.12     7.9    3.13    0.58   10.12    1.49    1.04     4.4    1.16    3.82
02   20210116    5.09   12.93    4.05   -1.84   10.22   -1.16   -0.53     3.8    3.55    3.57
03   20210123    6.82    5.48    3.08   -0.01    4.71    2.17    1.68    0.37    2.52    2.82

The above output (cols 3-12) needs to be in the format of %8.2f

Any help is appreciated.


Solution

  • One idea using GNU awk (for multi-dimensional array support):

    awk '
    FNR==NR { for (i=3;i<=NF;i++)                    # 1st file: loop through 3rd-12th fields
                  b[$1][i]=$i                        # store field values in 2-dimensional array
              next                                   # skip to next input line
            }
    $1 in b { printf "%s  %s", $1, $2                # 2nd file: if $1 is index in first dimension of array b[] then printf first 2 fields and then ...
              for (i=3;i<=NF;i++)                    # loop through 3rd-12th fields
                  printf "%8.2f", $i-b[$1][i]        # print the difference
              print ""                               # terminate the current line of output
            }
    ' File_B File_A                                  # process File_B first, then File_A
    

    This generates:

    01  20200111    4.36    5.37    8.86    3.81    3.55    6.74    0.32    1.89   -0.55    3.96
    02  20200118    8.12   -0.57    9.80   16.07   -7.70    8.53    1.36   -4.27   -3.30    2.94
    03  20200125    1.26    4.41   -1.45   -2.95    1.11   -2.31    0.53    2.92    1.51    0.20
    01  20210109    5.12    7.90    3.13    0.58   10.12    1.49    1.04    4.40    1.16    3.82
    02  20210116    5.09   12.93    4.05   -1.84   10.22   -1.16   -0.53    3.80    3.55    3.57
    03  20210123    6.82    5.48    3.08   -0.01    4.71    2.17    1.68    0.37    2.52    2.82
    

    NOTES: