I'm looking to parse AWS Load Balancer log files stored in S3, to calculate metrics by the site URL www.example.com instead of the virtual host app/something.com. Is this possible? I'm using GoAccess.
https 2019-11-24T23:55:01.603141Z app/something.com 34.222.222.22:47121 190.61.18.156:80 0.008 0.252 0.000 200 200 191 725 "GET https://www.example.com:443/something.php HTTP/1.1" "Wget/1.18 (linux-gnu)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-1:6474865788:targetgroup/mytargetgroup/be12345678 "Root=1-5ddb4567-149b7e874546754ed496" "www.example.com" "arn:aws:acm:eu-west-1:6474865788:certificate/pwdsw3455-4028-5cb7-854c-gdtr555" 0 2019-11-24T23:55:01.342000Z "waf,forward" "-" "-" "190.61.18.156:80" "200"
This will work for the line you posted, though you may want to use different delimiter if any of your fields can contain additional spaces.
awk -F'[ ]' '$3=$22$3' access.log | goaccess - -a