awksumrowsequential

Skip row value(s) with awk


I have the following input file:

 -0.805813  0.874753 -0.776101 -0.749147 -0.636834  0.379035 -0.004061 -0.004061
 -0.426119 -0.024801 -0.041989 -0.783686  0.361837  0.055206  0.368603  0.147965
 -0.632526 -0.100358  0.847947 -0.690233 -0.996141  0.445275  1.086014 -1.097968
  0.411383  0.411383 -0.734988  0.344954  2.577123 -0.372104 -0.923401  0.302907
  0.302907 -1.424862  1.165900 -0.776100 -0.776100 -0.495400  0.182533  0.002356
  0.002356  0.002356

I used awk to calculate the sum of these values in a sequential order (sum = -3.0000):

awk '{ for (i=1; i<=NF; i++) sum += $i } END { printf("%3.4f", sum) }' input.txt

Is there any possibility to use awk to skip values in a sequential order starting from the last line and to calculate sum for the rest of the values? For instance:

 -0.805813  0.874753 -0.776101 -0.749147 -0.636834  0.379035 -0.004061 -0.004061
 -0.426119 -0.024801 -0.041989 -0.783686  0.361837  0.055206  0.368603  0.147965
 -0.632526 -0.100358  0.847947 -0.690233 -0.996141  0.445275  1.086014 -1.097968
  0.411383  0.411383 -0.734988  0.344954  2.577123 -0.372104 -0.923401  0.302907
  0.302907 -1.424862  **1.165900 -0.776100 -0.776100 -0.495400  0.182533  0.002356
  0.002356  0.002356**

where I want to skip the values between the stars (sum = -2.3079). The number of values that should be skipped may variate.

Thanks!

I already achieved this by using sed piped with awk:

sed '$d' input.txt | awk '{ for (i=1; i<=NF; i++) sum += $i } END { for (i=NF-5; i<=NF; i++) sum -= $i; print sum }'

However, a pure awk one-liner would be more preferred.


Solution

  • Stripping down @markp-fuso's idea:

    awk -v RS=' ' '
        NF {
            ndx = cnt++ % lastN
            sum += circlist[ndx]
            circlist[ndx] = +$0
        }
        END { printf "%3.4f", sum }
    ' lastN=8 input.txt
    

    The reason his array initialization and comparisons are not needed is that awk guarantees the values of uninitialized variables.

    Splitting input on space (RS=' ') instead of newline and then checking the record has a field (the default behaviour of FS will split on the remaining whitespace), is more compact than his for loop to read each field, but requires that there is at least one actual space character between each number.

    Your example lines begin with a leading space; if they did not, my code would fail silently by discarding the first element on each line (it would become $2 but +$0 is parsed as just the value of $1). If your awk supports using regex as RS (which a future standard may allow, and many popular versions already support), this problem can be fixed by using RS='[[:space:]]+'. (Or by using the original for loop to iterate over the fields.)