amazon-web-servicesloggingamazon-s3amazon-elbgoaccess

Parse AWS Load Balancer log file by site URL instead of vhost (using GoAccess)


I'm looking to parse AWS Load Balancer log files stored in S3, to calculate metrics by the site URL www.example.com instead of the virtual host app/something.com. Is this possible? I'm using GoAccess.

https 2019-11-24T23:55:01.603141Z app/something.com 34.222.222.22:47121 190.61.18.156:80 0.008 0.252 0.000 200 200 191 725 "GET https://www.example.com:443/something.php HTTP/1.1" "Wget/1.18 (linux-gnu)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-1:6474865788:targetgroup/mytargetgroup/be12345678 "Root=1-5ddb4567-149b7e874546754ed496" "www.example.com" "arn:aws:acm:eu-west-1:6474865788:certificate/pwdsw3455-4028-5cb7-854c-gdtr555" 0 2019-11-24T23:55:01.342000Z "waf,forward" "-" "-" "190.61.18.156:80" "200"


Solution

  • This will work for the line you posted, though you may want to use different delimiter if any of your fields can contain additional spaces.

    awk -F'[ ]' '$3=$22$3' access.log | goaccess - -a