I need to hit ›page not found‹ log entries like this one:
185.220.100.252 - - [13/May/2022:10:03:58 +0200] "GET /EXPLOIT.php HTTP/1.1" 404 14780 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
This failregex basically works
^<HOST> -\s*- \[.*\] "GET .*" 404 \d+ "-" ".*"$
and finds 8900 out of 30k entries. I'm testing with
fail2ban-regex /var/log/apache2/scienceblog.at.access.log '^<HOST> -\s*- \[.*\] "GET .*" 404 \d+ "-" ".*"$'
And so does
^<HOST> -\s*- \[.*.*\] "GET .*" 404 \d+ "-" ".*"$
But when I try to get specific between the square brackets like in one of
^<HOST> -\s*- \[.*\d.*\] "GET .*" 404 \d+ "-" ".*"$
^<HOST> -\s*- \[.*\s.*\] "GET .*" 404 \d+ "-" ".*"$
^<HOST> -\s*- \[.* .*\] "GET .*" 404 \d+ "-" ".*"$
^<HOST> -\s*- \[\d.*\] "GET .*" 404 \d+ "-" ".*"$
^<HOST> -\s*- \[.*0200\] "GET .*" 404 \d+ "-" ".*"$
^<HOST> -\s*- \[.* .*\] "GET .*" 404 \d+ "-" ".*"$
or anything else (let alone a regex evaluating the whole date-string) the filter wouldn't find a single log entry and I can't figure out, why. I've already read, what I've found on fail2ban-regex here and elsewhere, but to no avail.
The failregex matches the logfile entry without the date, so for your example
185.220.100.252 - - [13/May/2022:10:03:58 +0200] "GET /EXPLOIT.php HTTP/1.1" 404 14780 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
fail2ban has extracted the date on its own
13/May/2022:10:03:58 +0200
and removed it from the log entry, and so is actually matching your regex against
185.220.100.252 - - [] "GET /EXPLOIT.php HTTP/1.1" 404 14780 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
so the regexes that worked for you, are working because
\[.*\]
and \[.*.*\]
both match []
but the other ones only match if there's actually something between the brackets.
imho this is not at all intuitive, since the output for "missed lines" includes the date:
Lines: 1 lines, 0 ignored, 0 matched, 1 missed
[processed in 0.01 sec]
|- Missed line(s):
| 185.220.100.252 - - [13/May/2022:10:03:58 +0200] "GET /EXPLOIT.php HTTP/1.1" 404 14780 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
But you can verify this is the case since this will give a successful match:
'^<HOST> -\s*- \[\] "GET .*" 404 \d+ "-" ".*"$'
Further reading: