I'm trying to find any blank user agents and traces of spoofed user agents in my apache access logs.
Here's a typical line from my Access Log: (with IP and domain redacted)
x.x.x.x - - [10/Nov/2012:16:48:38 -0500] "GET /YLHicons/reverbnation50.png HTTP/1.1" 304 - "http://www.example.com/newaddtwitter.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/534.7 ZemanaAID/FFFF0077"
For blank user agents I'm trying to do this:
awk -F\" '($6 ~ /^-?$/)' /www/logs/www.example.com-access.log | awk '{print $1}' | sort | uniq
For finding info about UA's I'm running this: (Gives me the amount of hits each unique UA has)
awk -F\" '{print $6}' /www/logs/www.example.com-access.log | sort | uniq -c | sort -fr
What can I do differently to make these commands stronger and more thought out, while giving me the best information I can to combat bots and other scums of the Internet?
I wouldn't use \"
as a field separator. CLF is constructed well enough that if you separate on whitespace, field 12 is the start of your user agent. If $12 == '""'
, the user agent is blank.
Remember that awk
can accept standard input. So you can have "live" monitoring of your Apache log with:
$ tail -F /path/to/access.log | /path/to/awkscript
Just remember that when invoked this way, an awk script will never reach its END
. But you can process lines as they are added to the log by Apache.
Something like this might help. Add to it as you see fit.
#!/usr/bin/awk -f
BEGIN {
mailcmd="Mail -s \"Security report\" webmaster@example.com";
}
# Detect empty user-agent
$12 == "" {
report="Empty user agent from " $1 "\n";
}
# Detect image hijacking
$7 ~ /\.(png|jpg)$/ && $11 !~ /^http:\/\/www.example.com\// {
report=report "Possible hijacked image from " $1 " (" $11 " -> " $7 ")\n";
}
# Detect too many requests per second from one host
thissecond != $4 {
delete count;
thissecond=$4;
}
{
count[$1]++;
for (ip in count) {
if (count[ip] > 100) {
report=report "Too many requests from " $1 "\n";
delete(count[ip]); # Avoid too many reports
}
}
}
# Send report, if there is one
length(report) {
print report | mailcmd; # Pipe output through a command.
close(mailcmd); # Closing the pipe sends the mail.
report=""; # Blank the report, ready for next.
}
Note that counting requests within a particular second is only marginally helpful; if you have a lot of traffic from China, or university/corporate networks behind firewalls, then many requests might appear to come from a single IP address. And the Mail
command isn't a great way to handle notifications; I include it here only for demonstration purposes. YMMV, salt to taste.