awktimestampstrftime

Create timestamp with fractional seconds


awk can generate a timestamp with strftime function, e.g.

$ awk 'BEGIN {print strftime("%Y/%m/%d %H:%M:%S")}'
2019/03/26 08:50:42

But I need a timestamp with fractional seconds, ideally down to nanoseconds. gnu date can do this with the %N element:

$ date "+%Y/%m/%d %H:%M:%S.%N"
2019/03/26 08:52:32.753019800

But it is relatively inefficient to invoke date from within awk compared to calling strftime, and I need high performance as I'm processing many large files with awk and need to generate many timestamps while processing the files. Is there a way that awk can efficiently generate a timestamp that includes fractional seconds (ideally nanoseconds, but milliseconds would be acceptable)?

Adding an example of what I am trying to perform:

awk -v logFile="$logFile" -v outputFile="$outputFile" '
BEGIN {
   print "[" strftime("%Y%m%d %H%M%S") "] Starting to process " FILENAME "." >> logFile
}
{
    data[$1] += $2
}
END {
    print "[" strftime("%Y%m%d %H%M%S") "] Processed " NR " records." >> logFile
    for (id in data) {
        print id ": " data[id] >> outputFile
    }
}
' oneOfManyLargeFiles

Solution

  • If you are really in need of subsecond timing, then any call to an external command such as date or reading an external system file such as /proc/uptime or /proc/rct defeats the purpose of the subsecond accuracy. Both cases require to many resources to retrieve the requested information (i.e. the time)

    Since the OP already makes use of GNU awk, you could make use of a dynamic extension. Dynamic extensions are a way of adding new functionality to awk by implementing new functions written in C or C++ and dynamically loading them with gawk. How to write these functions is extensively written down in the GNU awk manual.

    Luckily, GNU awk 4.2.1 comes with a set of default dynamic libraries which can be loaded at will. One of these libraries is a time library with two simple functions:

    the_time = gettimeofday() Return the time in seconds that has elapsed since 1970-01-01 UTC as a floating-point value. If the time is unavailable on this platform, return -1 and set ERRNO. The returned time should have sub-second precision, but the actual precision may vary based on the platform. If the standard C gettimeofday() system call is available on this platform, then it simply returns the value. Otherwise, if on MS-Windows, it tries to use GetSystemTimeAsFileTime().

    result = sleep(seconds) Attempt to sleep for seconds seconds. If seconds is negative, or the attempt to sleep fails, return -1 and set ERRNO. Otherwise, return zero after sleeping for the indicated amount of time. Note that seconds may be a floating-point (nonintegral) value. Implementation details: depending on platform availability, this function tries to use nanosleep() or select() to implement the delay.

    source: GNU awk manual

    It is now possible to call this function in a rather straightforward way:

    awk '@load "time"; BEGIN{printf "%.6f", gettimeofday()}'
    1553637193.575861
    

    In order to demonstrate that this method is faster then the more classic implementations, I timed all 3 implementations using gettimeofday():

    awk '@load "time"
         function get_uptime(   a) {
            if((getline line < "/proc/uptime") > 0)
            split(line,a," ")
            close("/proc/uptime")
            return a[1]
         }
         function curtime(    cmd, line, time) {
            cmd = "date \047+%Y/%m/%d %H:%M:%S.%N\047"
            if ( (cmd | getline line) > 0 ) {
               time = line
            }
            else {
               print "Error: " cmd " failed" | "cat>&2"
            }
            close(cmd)
            return time
          }
          BEGIN{
            t1=gettimeofday(); curtime(); t2=gettimeofday();
            print "curtime()",t2-t1
            t1=gettimeofday(); get_uptime(); t2=gettimeofday();
            print "get_uptime()",t2-t1
            t1=gettimeofday(); gettimeofday(); t2=gettimeofday();
            print "gettimeofday()",t2-t1
          }'
    

    which outputs:

    curtime() 0.00519109
    get_uptime() 7.98702e-05
    gettimeofday() 9.53674e-07
    

    While it is evident that curtime() is the slowest as it loads an external binary, it is rather startling to see that awk is blazingly fast in processing an extra external /proc/ file.