pythonregexnginxexport-to-csvnginx-log

nginx.log convert to CSV with Python


I have a part of nginx.log file:

192.168.226.64 - - [26/Apr/2021:21:20:37 +0000] "GET /api/datasources/proxy/1/api/v1/query_range?query=probe_ssl_earliest_cert_expiry%7Btarget%3D~%22()%22%7D-time()&start=1619471730&end=1619472030&step=30 HTTP/2.0" 200 212 "https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36" 134 0.006 [monitoring-monitoring-prometheus-grafana-80] [] 192.168.226.102:3000 212 0.008 200 6bc328f046dcd1df823aa920397fb346
192.168.226.64 - - [26/Apr/2021:21:20:37 +0000] "GET /api/datasources/proxy/1/api/v1/query_range?query=probe_success%7Btarget%3D~%22()%22%7D&start=1619471730&end=1619472030&step=30 HTTP/2.0" 200 201 "https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36" 116 0.007 [monitoring-monitoring-prometheus-grafana-80] [] 192.168.226.102:3000 201 0.008 200 c10141117983e888db68f2e1ff223575
192.168.226.64 - - [26/Apr/2021:21:20:37 +0000] "GET /api/datasources/proxy/1/api/v1/query_range?query=probe_http_ssl%7Btarget%3D~%22()%22%7D&start=1619471730&end=1619472030&step=30 HTTP/2.0" 200 204 "https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36" 117 0.007 [monitoring-monitoring-prometheus-grafana-80] [] 192.168.226.102:3000 204 0.008 200 60724ca6531bc640649bac50bbc04a7e

I need convert this nginx.log to CSV file via Python, how should I do this or what RegEx should I use for this


Solution

  • You can use the code below as a base for what you want. You basically need to do some custom line splitting to get the elements you want. Note that the useragent is the reason to split on the quote char first, as this is the only element (AFAIK) that can have unpredictable number of spaces.

    I've added a simple helper function to show the numbering of the elements, and I'm showing a couple of different approaches to the splitting. The variable names might need changing, as I'm not 100% sure of what exactly you're logging in NGINX...

    def splitline(line: str) -> list:
        # these split is used multiple times, so do it once here
        # note that the useragent string might contain spaces, so we first need to split on quote chars
        split_quote = line.split('"')
        ip = split_quote[0].split()[0]
        date_time = split_quote[0].split('[')[1].split(']')[0]
        method = split_quote[1]
        http1, http2 = split_quote[2].split()
        useragent = split_quote[5]
        bytesize, resp_time1, prom, empty, ip_port, http3, resp_time2, http4, hex_string = split_quote[6].split()
        return [
            ip, date_time, method, http1, http2, useragent, bytesize, resp_time1, prom, empty, ip_port, http3, resp_time2, http4, hex_string
        ]
    
    def print_elements(line):
        split_quote = line.split('"')
        for x, squote in enumerate(split_quote):
            print(f"{x:>2}    {squote}")
            for y, sspace in enumerate(squote.split()):
                print(f"{x:>2} {y:>2} {sspace}")
    
    
    with open("logfile.log") as infile:
        data = infile.read().splitlines()
    
    print_elements(data[0])
    
    
    for line in data:
        print(splitline(line))
    

    output

     0    192.168.226.64 - - [26/Apr/2021:21:20:37 +0000] 
     0  0 192.168.226.64
     0  1 -
     0  2 -
     0  3 [26/Apr/2021:21:20:37
     0  4 +0000]
     1    GET /api/datasources/proxy/1/api/v1/query_range?query=probe_ssl_earliest_cert_expiry%7Btarget%3D~%22()%22%7D-time()&start=1619471730&end=1619472030&step=30 HTTP/2.0
     1  0 GET
     1  1 /api/datasources/proxy/1/api/v1/query_range?query=probe_ssl_earliest_cert_expiry%7Btarget%3D~%22()%22%7D-time()&start=1619471730&end=1619472030&step=30
     1  2 HTTP/2.0
     2     200 212
     2  0 200
     2  1 212
     3    https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s
     3  0 https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s
     4
     5    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36
     5  0 Mozilla/5.0
     5  1 (Macintosh;
     5  2 Intel
     5  3 Mac
     5  4 OS
     5  5 X
     5  6 10_15_7)
     5  7 AppleWebKit/537.36
     5  8 (KHTML,
     5  9 like
     5 10 Gecko)
     5 11 Chrome/90.0.4430.85
     5 12 Safari/537.36
     6     134 0.006 [monitoring-monitoring-prometheus-grafana-80] [] 192.168.226.102:3000 212 0.008 200 6bc328f046dcd1df823aa920397fb346
     6  0 134
     6  1 0.006
     6  2 [monitoring-monitoring-prometheus-grafana-80]
     6  3 []
     6  4 192.168.226.102:3000
     6  5 212
     6  6 0.008
     6  7 200
     6  8 6bc328f046dcd1df823aa920397fb346
    ['192.168.226.64', '26/Apr/2021:21:20:37 +0000', 'GET /api/datasources/proxy/1/api/v1/query_range?query=probe_ssl_earliest_cert_expiry%7Btarget%3D~%22()%22%7D-time()&start=1619471730&end=1619472030&step=30 HTTP/2.0', '200', '212', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36', '134', '0.006', '[monitoring-monitoring-prometheus-grafana-80]', '[]', '192.168.226.102:3000', '212', '0.008', '200', '6bc328f046dcd1df823aa920397fb346']
    ['192.168.226.64', '26/Apr/2021:21:20:37 +0000', 'GET /api/datasources/proxy/1/api/v1/query_range?query=probe_success%7Btarget%3D~%22()%22%7D&start=1619471730&end=1619472030&step=30 HTTP/2.0', '200', '201', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36', '116', '0.007', '[monitoring-monitoring-prometheus-grafana-80]', '[]', '192.168.226.102:3000', '201', '0.008', '200', 'c10141117983e888db68f2e1ff223575']
    ['192.168.226.64', '26/Apr/2021:21:20:37 +0000', 'GET /api/datasources/proxy/1/api/v1/query_range?query=probe_http_ssl%7Btarget%3D~%22()%22%7D&start=1619471730&end=1619472030&step=30 HTTP/2.0', '200', '204', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36', '117', '0.007', '[monitoring-monitoring-prometheus-grafana-80]', '[]', '192.168.226.102:3000', '204', '0.008', '200', '60724ca6531bc640649bac50bbc04a7e']