bashdateawkdate-formatting

Formatting date inside bash awk command


I am trying to get the date of some logs in a certain format to do some comparisons afterwards, here is my command:

fgrep "<expression>" <logFile> | sort | awk -F "[][]" 'messageDate=$(date -d "$2" "+%Y.%j.%H.%M.%S") { print messageDate }'

What I get printed are the full lines of the log files.

If instead I run this:

fgrep "<expression>" <logFiles> | sort | awk -F "[][]" 'messageDate=$(date -d "$2" "+%Y.%j.%H.%M.%S") { print $2 }'

I get the dates but not in the format I want.

2014-09-04T08:22:16Z
2017-10-08T16:05:06Z
2022-11-30T14:50:16Z

The log files have messages like this:

[2022-11-30T14:50:16Z] <Info/Warning/Error>: <log message>

Does anyone understand why the awk is working, meaning it is splitting the file correctly but the code messageDate=$(date -d "$2" "+%Y.%j.%H.%M.%S") is somehow getting the full log message again?


Solution

  • Regarding "bash awk command" - bash and awk are 2 completely different tools. You can call awk from bash, or you can call bash from awk, just like you can call a C program or perl script from bash and vice-versa, but each tool has it's own separate language, scope, etc.

    When you wrote:

    awk -F "[][]" '
        messageDate=$(date -d "$2" "+%Y.%j.%H.%M.%S") {
            print messageDate
        }
    '
    

    you're trying to use bash syntax inside an awk script which is no more likely to work than trying to use bash syntax inside a C program.

    To call a Unix utility like date from awk (via an intermediate shell such as bash) and store the result in an awk variable then print the result if that operation succeeded, as you appear to be trying to do in your script, would be this using any awk:

    awk -F "[][]" '
        {
            messageDate = ""
            if ( ("date -d \047" $2 "\047 \047+%Y.%j.%H.%M.%S\047" | getline line) > 0 ) {
                messageDate = line
            }
        }
        messageDate != "" { print messageDate }
    '
    

    See http://awk.freeshell.org/AllAboutGetline (or, if that's down, https://web.archive.org/web/20221109201352/http://awk.freeshell.org/AllAboutGetline) for more information on how I'm using getline there.

    If you're using GNU awk, though, it has it's own builtin time functions so you could do:

    awk -F "[][]" '
        {
            messageDate = ""
            secs = mktime( gensub(/[-T:Z]/, " ", "g", $2), 1 )
            if ( secs >= 0 ) {
                messageDate = strftime("%Y.%j.%H.%M.%S", secs)
            }
        }
        messageDate != "" { print messageDate }
    ' file
    

    which would be orders of magnitude faster than having awk spawn a shell to call date once per input line.