I just started learning Python and would like to read an Apache log file and put parts of each line into different lists.
line from the file
172.16.0.3 - - [25/Sep/2002:14:04:19 +0200] "GET / HTTP/1.1" 401 - "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827"
according to Apache website the format is
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\
I'm able to open the file and just read it as it is but I don't know how to make it read in that format so I can put each part in a list.
This is a job for regular expressions.
For example:
line = '172.16.0.3 - - [25/Sep/2002:14:04:19 +0200] "GET / HTTP/1.1" 401 - "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827"'
regex = '^([(\d\.)]+) [^ ]* [^ ]* \[([^ ]* [^ ]*)\] "([^"]*)" (\d+) [^ ]* "([^"]*)" "([^"]*)"'
import re
print re.match(regex, line).groups()
The output would be a tuple with 6 pieces of information from the line (specifically, the groups within parentheses in that pattern):
('172.16.0.3', '25/Sep/2002:14:04:19 +0200', 'GET / HTTP/1.1', '401', '', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827')