bashawkgawkmawk

AWK matching values in a column and performing calculation


I'm new at AWK and I'm trying to figure out an answer for my problem. I have a flat file with the following values:

403 | SanMateo   | f | 2015-04-09 18:50:24.38
403 | SanMateo   | t | 2015-04-09 18:45:24.36
403 | SanMateo   | t | 2015-04-09 18:40:24.383
403 | SanMateo   | f | 2015-04-09 18:35:24.357
403 | SanMateo   | t | 2015-04-09 18:30:24.355
404 | RedwoodCity| f | 2015-04-09 18:35:50.308
404 | RedwoodCity| t | 2015-04-09 18:30:50.242
404 | RedwoodCity| f | 2015-04-09 18:25:50.245
404 | RedwoodCity| t | 2015-04-09 18:20:50.242
404 | RedwoodCity| f | 2015-04-09 18:15:50.242

I want to use awk to compare $1 of the current line to $1 of the next line, and $3 ~/f/. if the statement is true then subtract $4 of the next line from $4 of the current line and write the difference in a new column of the current line and if false then do nothing. what I have so far is this:

awk 'BEGIN {FS="|";} {if (NR $1 ~ NR++ $1 && $3 ~ /f/) subtract = NR $4 - NR++ $4; {print subtract}}' allHealthRecords_Sorted

and obviously that's not working. Can someone please help?


Solution

  • save this as time_diff.awk

    BEGIN {FS = "[[:blank:]]*\\|[[:blank:]]*"}
    
    # convert "YYYY-mm-dd HH:MM:SS.fff" to a number
    function to_time(timestamp,       fraction) {
        fraction = timestamp
        sub(/\..*$/, "", timestamp)
        gsub(/[-:]/, " ", timestamp)
        sub(/.*\./, "0.", fraction)
        return mktime(timestamp) + fraction
    }
    
    # gawk has no builtin abs() function
    function abs(val) { 
        return( val < 0 ? -1*val : val) 
    }
    
    # add the time diff if the condition is met
    NR > 1 {
        diff = 0
        if ($1+0 == key && flag == "f") 
            diff = abs( to_time($4) - to_time(time) )
        print line (diff > 0 ? " | " diff : "")
    } 
    
    {
        # remember the previous line's values
        key = $1+0; flag = $3; time = $4; line = $0
    }
    
    END {print}
    

    Then

    $ gawk -f time_diff.awk file
            403 | SanMateo| f                | 2015-04-09 18:50:24.38 | 300.02
            403 | SanMateo| t                | 2015-04-09 18:45:24.36
            403 | SanMateo| t                | 2015-04-09 18:40:24.383
            403 | SanMateo| f                | 2015-04-09 18:35:24.357 | 300.002
            403 | SanMateo| t                | 2015-04-09 18:30:24.355
            404 | RedwoodCity| f                | 2015-04-09 18:35:50.308 | 300.066
            404 | RedwoodCity| t                | 2015-04-09 18:30:50.242
            404 | RedwoodCity| f                | 2015-04-09 18:25:50.245 | 300.003
            404 | RedwoodCity| t                | 2015-04-09 18:20:50.242
            404 | RedwoodCity| f                | 2015-04-09 18:15:50.242