apachelogginggoaccess

Trivial goaccess log parsing not working


I'm trying to set up goaccess to analyse some apache output which is highly customised. I didn't fancy my chances writing a .goaccessrc file straight off, so decided to simplify the log (in a text editor) and start slowly. However, I can't even get this trivial example to work. I've also tried some examples from SO that are marked as 'Answered', but I'm still getting the rather terse 'Nothing valid to process' message.

Here's a line from my simplified log file:

2014-05-14 06:26:18 "GET / HTTP/1.1" 200 37.157.246.146

and here's my .goaccessrc:

date_format %Y-%m-%d %H:%M:%S
log_format %d "%r" %s %h

I'm sure the .goaccessrc file is in the right place and being read, because if I remove it, I get the Log Format Configuration window when running goaccess. I'm sure it's something trivial, but I just can't see it. Here's the full output of my recent terminal session:

[root@dev ~] # cat .goaccessrc
date_format %Y-%m-%d %H:%M:%S
log_format %d "%r" %s %h
[root@dev ~] # cat /var/log/apache2/simple.log
2014-05-14 06:26:18 "GET / HTTP/1.1" 200 37.157.246.146
[root@dev ~] # goaccess -f /var/log/apache2/simple.log

GoAccess - version 0.7.1 - Apr 18 2014 21:28:20

An error has occurred
Error occured at: goaccess.c - render_screens - 456
Message: Nothing valid to process.

Solution

  • OK, see here for the full answer. It basically boils down to this. All parsing seems to be driven by log_format, and the token separator is the space character. So in the example above, the first %d placeholder in log_format matches up to the end of 2014-05-14 and then stops. Then the next token ("%r") then fails when it finds the beginning of the time portion.

    Solution to the above is:

    date_format %Y-%m-%d
    log_format %d %^ "%r" %s %h
    

    which matches the date (only, not time), then ignores everything up to the first " character, then matches the request URL and then finally the status and host address.

    Note it seems that unless the date and time are a single token (no whitespace), you can't match the time portion successfully.