bashtextlogfiles

Split access.log file by dates using command line tools


I have a Apache access.log file, which is around 35GB in size. Grepping through it is not an option any more, without waiting a great deal.

I wanted to split it in many small files, by using date as splitting criteria.

Date is in format [15/Oct/2011:12:02:02 +0000]. Any idea how could I do it using only bash scripting, standard text manipulation programs (grep, awk, sed, and likes), piping and redirection?

Input file name is access.log. I'd like output files to have format such as access.apache.15_Oct_2011.log (that would do the trick, although not nice when sorting.)


Solution

  • One way using awk:

    awk 'BEGIN {
        split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ", months, " ")
        for (a = 1; a <= 12; a++)
            m[months[a]] = sprintf("%02d", a)
    }
    {
        split($4,array,"[:/]")
        year = array[3]
        month = m[array[2]]
    
        print > FILENAME"-"year"_"month".txt"
    }' incendiary.ws-2009
    

    This will output files like:

    incendiary.ws-2010-2010_04.txt
    incendiary.ws-2010-2010_05.txt
    incendiary.ws-2010-2010_06.txt
    incendiary.ws-2010-2010_07.txt
    

    Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.

    Original inspiration: "How to split existing apache logfile by month?"