awk
can generate a timestamp with strftime function, e.g.
$ awk 'BEGIN {print strftime("%Y/%m/%d %H:%M:%S")}'
2019/03/26 08:50:42
But I need a timestamp with fractional seconds, ideally down to nanoseconds. gnu date
can do this with the %N
element:
$ date "+%Y/%m/%d %H:%M:%S.%N"
2019/03/26 08:52:32.753019800
But it is relatively inefficient to invoke date
from within awk
compared to calling strftime
, and I need high performance as I'm processing many large files with awk
and need to generate many timestamps while processing the files. Is there a way that awk
can efficiently generate a timestamp that includes fractional seconds (ideally nanoseconds, but milliseconds would be acceptable)?
Adding an example of what I am trying to perform:
awk -v logFile="$logFile" -v outputFile="$outputFile" '
BEGIN {
print "[" strftime("%Y%m%d %H%M%S") "] Starting to process " FILENAME "." >> logFile
}
{
data[$1] += $2
}
END {
print "[" strftime("%Y%m%d %H%M%S") "] Processed " NR " records." >> logFile
for (id in data) {
print id ": " data[id] >> outputFile
}
}
' oneOfManyLargeFiles
If you are really in need of subsecond timing, then any call to an external command such as date
or reading an external system file such as /proc/uptime
or /proc/rct
defeats the purpose of the subsecond accuracy. Both cases require to many resources to retrieve the requested information (i.e. the time)
Since the OP already makes use of GNU awk, you could make use of a dynamic extension. Dynamic extensions are a way of adding new functionality to awk by implementing new functions written in C or C++ and dynamically loading them with gawk. How to write these functions is extensively written down in the GNU awk manual.
Luckily, GNU awk 4.2.1 comes with a set of default dynamic libraries which can be loaded at will. One of these libraries is a time
library with two simple functions:
the_time = gettimeofday()
Return the time in seconds that has elapsed since 1970-01-01 UTC as a floating-point value. If the time is unavailable on this platform, return-1
and setERRNO
. The returned time should have sub-second precision, but the actual precision may vary based on the platform. If the standard Cgettimeofday()
system call is available on this platform, then it simply returns the value. Otherwise, if on MS-Windows, it tries to useGetSystemTimeAsFileTime()
.
result = sleep(seconds)
Attempt to sleep forseconds
seconds. Ifseconds
is negative, or the attempt to sleep fails, return-1
and setERRNO
. Otherwise, return zero after sleeping for the indicated amount of time. Note that seconds may be a floating-point (nonintegral) value. Implementation details: depending on platform availability, this function tries to usenanosleep()
orselect()
to implement the delay.
source: GNU awk manual
It is now possible to call this function in a rather straightforward way:
awk '@load "time"; BEGIN{printf "%.6f", gettimeofday()}'
1553637193.575861
In order to demonstrate that this method is faster then the more classic implementations, I timed all 3 implementations using gettimeofday()
:
awk '@load "time"
function get_uptime( a) {
if((getline line < "/proc/uptime") > 0)
split(line,a," ")
close("/proc/uptime")
return a[1]
}
function curtime( cmd, line, time) {
cmd = "date \047+%Y/%m/%d %H:%M:%S.%N\047"
if ( (cmd | getline line) > 0 ) {
time = line
}
else {
print "Error: " cmd " failed" | "cat>&2"
}
close(cmd)
return time
}
BEGIN{
t1=gettimeofday(); curtime(); t2=gettimeofday();
print "curtime()",t2-t1
t1=gettimeofday(); get_uptime(); t2=gettimeofday();
print "get_uptime()",t2-t1
t1=gettimeofday(); gettimeofday(); t2=gettimeofday();
print "gettimeofday()",t2-t1
}'
which outputs:
curtime() 0.00519109
get_uptime() 7.98702e-05
gettimeofday() 9.53674e-07
While it is evident that curtime()
is the slowest as it loads an external binary, it is rather startling to see that awk is blazingly fast in processing an extra external /proc/ file.