bashfail2ban

Bash: splitting a list of strings each containing space-separated words in different variables for each word


I'm trying to parse the apache error log to grep the lines that corresponds to the "offending" IPs found in the fail2ban log.

I'm using a script in bash.

First I extract the offending IPs:

offenders=$(grep -F "[apache-errors] Found" /var/log/fail2ban.log | awk '{print $8}' | sort | uniq)

Then for each IP I get the entries from the fail2ban.log; there may be multiple entries, because the IP may have done requests at multiple times:

for ip in $offenders; do
    entries=$(grep -F "[apache-errors] Found $ip" /var/log/fail2ban.log | awk '{print $8" "$10" "$11}' | sort | uniq)
 
    declare _count_entries=$(echo "${entries[@]}" | wc -l)
    echo "Found $_count_entries error entries for IP $ip"

    for entry in "${entries[@]}"; do
        echo "$entry"
    done
done

This is what I get so far (IPs have been anonymized):

[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55

Now what I want to do is, for each line extract the ip, date and time portions. I tried something like this, but IT DOES NOT WORK, it prints only the (ip,date,time) for the first entry:

for ip in $offenders; do
    entries=$(grep -F "[apache-errors] Found $ip" /var/log/fail2ban.log | awk '{print $8" "$10" "$11}' | sort | uniq)
    
    for entry in "${entries[@]}"; do

        echo "$entry"

        _ip=($(echo "$entry" | cut -d ' ' -f1))
        _date=($(echo "$entry" | cut -d ' ' -f2))
        _time=($(echo "$entry" | cut -d ' ' -f3))
        echo "ip=$_ip , date=$_date , time=$_time"

    done
done

Output: for each entry, only the (ip,date,time) portions of the first one is echoed:

[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
ip=10.10.0.29 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49

The desired output would be:

[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
ip=10.10.0.29 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
ip=10.20.0.242 , date=2021-12-30 , time=12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
ip=10.30.0.186 , date=2022-01-02 , time=05:40:24
ip=10.30.0.186 , date=2022-01-02 , time=07:38:55

So how can I do that in bash?

The final goal is to use the ip, date and time portions to build a regex like this, because I want to grep the lines from the error logs that correspond exactly to the findings in the fail2ban log:

grep -P "^(\[$_date $_time)(.+\[client )($_ip).+$" /var/log/apache2/error.log

Solution

  • You could go with something like this:

    #!/bin/bash
      
    print_errors() {
      local ip=$1
      [ -n "$ip" ] || return
      shift
      echo "[INFO] Found ${#@} error entries for IP $ip"
      printf '%s\n' "$@"
    }
    
    prev_ip=
    errors=()
    while read -r ip date time
    do
        if [ "$prev_ip" != "$ip" ]
        then
            print_errors "$prev_ip" "${errors[@]}"
            prev_ip=$ip
            errors=()
        fi
        errors+=("ip=$ip , date=$date , time=$time")
    done < <(
        grep -F "[apache-errors] Found" /var/log/fail2ban.log |
        awk '{print $8" "$10" "$11}' |
        sort
    )
    
    print_errors "$prev_ip" "${errors[@]}"
    

    But bash is not really meant for that, it's better to write the same logic with awk (I'm doing the sorting outside of awk here):

    grep -F "[apache-errors] Found" /var/log/fail2ban.log | sort -k 8,1 |
    awk '
        function print_errors(ip, arr) {
            if (ip == "") return
            print "[INFO] Found "length(arr)" error entries for IP "ip
            for (i in arr) print arr[i]
        }
        BEGIN { ip = "" }
        {
            if ($8 != ip) {
                print_errors(ip, arr)
                delete arr
                ip = $8
            }
            arr[length(arr)+1] = "ip="$8" , date="$10" , time="$11
        }
        END{ print_errors(ip, arr) }
    '
    

    Or even better, write the whole thing in a language that has multidimensional associative arrays and text processing facilities:

    Example with ruby:

    #!/usr/bin/env ruby
      
    ARGF.each_line.with_object(Hash.new{|h,k| h[k] = []}) do |line,hash|
      ip,date,time = line.split.values_at(7,9,10)
      hash[ip] << "ip=#{ip} , date=#{date} , time=#{time}"
    end.each do |ip,arr|
      puts "[INFO] Found #{arr.count} error entries for IP #{ip}"
      puts arr.join("\n")
    end
    

    output example of the three programs above:

    [INFO] Found 1 error entries for IP 10.10.0.129
    ip=10.10.0.129 , date=2021-12-20 , time=06:33:12
    [INFO] Found 2 error entries for IP 10.20.0.242
    ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
    ip=10.20.0.242 , date=2021-12-30 , time=12:03:55
    [INFO] Found 3 error entries for IP 10.30.0.186
    ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
    ip=10.30.0.186 , date=2022-01-02 , time=05:40:24
    ip=10.30.0.186 , date=2022-01-02 , time=07:38:55