azure-log-analyticsiis-logskql

Parse IIS logs in Azure Log Analytics' Query


Azure Log Analytics (aka OMS) uses Kusto QL. We ship our IIS logs from docker containers to Log Analytics and I intend to use the following query to parse the entries:

ContainerLog
| extend fields  = split(LogEntry, ' ')
| extend appname = tostring(fields[16])
| extend path = tostring(fields[4])

Here are 2 different records which I parsed by hand:

date       time     s-ip           cs-method  cs-uri-stem                     cs-uri-query     s-port   cs-username                c-ip           cs(User-Agent)                                                                  cs(Referer)                    sc-status  sc-substatus  sc-win32-status  time-taken   x-forwarded-for                  container-app
2019-11-29 17:37:49 ddd.dd.dd.ddd  GET        /ping.aspx                      -                80       -                          dd.dd.ddd.d    Go-http-client/1.1                                                              -                              200        0             0                12           dd.dd.ddd.d                      OurCustomValue  
2019-11-29 17:33:36 ddd.dd.dd.ddd  GET        /js/js_v4/jquery-functions.js   v=26.35.0.0      80       7vgnwjAzOsKcUpseaPykcQ--   dd.dd.ddd.d    Mozilla/5.0+(Windows+NT+10.0 +Win64 +x64 +rv:70.0)+Gecko/20100101+Firefox/70.0  https://site.domain.com/       200        0             0                14           dd.ddd.dd.dd:55001,+dd.dd.ddd.d  OurCustomValue  

Problem is that the "User Agent" field could contain space which confuses the parser and split it into other fields. So all the indices for the fields listed after User-Agent are gonna be off. For instance, the last field for the 2nd record won't be at index 16 (starting from 0), but rather 19.

Is there a better way to parse these logs, say, by defining the number or types of fields?


Solution

  • you could try the following direction.

    note: if you don't actually need a distinction between both the user_agent and referer fields, then you could have them both in a single field, which is what the example below does (and if you do need them separately, you can apply specific logic for parsing that field later on, after the first invocation of parse)

    datatable(s:string)
    [
        '2019-11-29 17:37:49 ddd.dd.dd.ddd GET /ping.aspx - 80 - dd.dd.ddd.d Go-http-client/1.1 - 200 0 0 12 dd.dd.ddd.d OurCustomValue', 
        '2019-11-29 17:33:36 ddd.dd.dd.ddd GET /js/js_v4/jquery-functions.js v=26.35.0.0 80 7vgnwjAzOsKcUpseaPykcQ-- dd.dd.ddd.d Mozilla/5.0+(Windows+NT+10.0 +Win64 +x64 +rv:70.0)+Gecko/20100101+Firefox/70.0  https://site.domain.com/ 200 0 0 14 dd.ddd.dd.dd:55001,+dd.dd.ddd.d OurCustomValue',
    ]
    | parse s with d:datetime " " t:timespan " " s_ip " " cs_method " " cs_uri_stem " " cs_uri_query " " s_port:int " " cs_username " " c_ip " " cs_user_agent " " sc_status:int " " sc_substatus:int " " sc_win32_status:int " " time_taken:int " " x_forwarded_for " " container_app
    | project appname = container_app, path = cs_uri_stem, browser_family = parse_user_agent(cs_user_agent, "browser").Browser.Family
    

    this will output:

    | appname        | path                          | browser_family |
    |----------------|-------------------------------|----------------|
    | OurCustomValue | /ping.aspx                    | Go-http-client |
    | OurCustomValue | /js/js_v4/jquery-functions.js | Firefox        |