azureazure-log-analytics

How can I run a search job periodically in Azure Log Analytics?


I'm trying to visualize the browser statistics of our app hosted in Azure.

For that I'm using the nginx logs and run an Azure Log Analytics query like this:

ContainerLog
| where LogEntrySource == "stdout" and LogEntry has "nginx"
| extend logEntry=parse_json(LogEntry)
| extend userAgent=parse_user_agent(logEntry.nginx.http_user_agent, "browser")
| extend browser=parse_json(userAgent)
| summarize count=count() by tostring(browser.Browser.Family)
| sort by ['count']
| render piechart with (legend=hidden)

Then I'm getting this diagram, which is exactly what I want:

enter image description here

But the query is very very slow. If I set the time range to more than just the last few hours it takes several minutes or doesn't work at all.

My solution is to use a search job like this:

ContainerLog
| where LogEntrySource == "stdout" and LogEntry has "nginx"
| extend d=parse_json(LogEntry)
| extend user_agent=parse_user_agent(d.nginx.http_user_agent, "browser")
| extend browser=parse_json(user_agent)

It creates a new table BrowserStats_SRCH on which I can do this search query:

BrowserStats_SRCH
| summarize count=count() by tostring(browser.Browser.Family)
| sort by ['count']
| render piechart with (legend=hidden)

This is much faster now and only takes some seconds.

But my problem is, how can I keep this up-to-date? Preferably this search job would run once a day automatically and refreshed the BrowserStats_SRCH table so that new queries on that table run always on the most recent logs. Is this possible? Right now I can't even trigger the search job manually again, because then I get the error "A destination table with this name already exists".

In the end I would like to have a deeplink to the pie chart with the browser stats without the need to do any further click. Any help would be appreciated.


Solution

  • But my problem is, how can I keep this up-to-date? Preferably this search job would run once a day automatically and refreshed the BrowserStats_SRCH table so that new queries on that table run always on the most recent logs. Is this possible?

    You can leverage the api to create a search job. Then use a timer triggered azure function or logic app to call that api on a schedule.

    PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-00000000000/resourcegroups/testRG/providers/Microsoft.OperationalInsights/workspaces/testWS/tables/Syslog_suspected_SRCH?api-version=2021-12-01-preview
    

    with a request body containing the query

    {
        "properties": { 
            "searchResults": {
                    "query": "Syslog | where * has 'suspected.exe'",
                    "limit": 1000,
                    "startSearchTime": "2020-01-01T00:00:00Z",
                    "endSearchTime": "2020-01-31T00:00:00Z"
                }
        }
    }
    

    Or you can use the Azure CLI:

    az monitor log-analytics workspace table search-job create --subscription ContosoSID --resource-group ContosoRG  --workspace-name ContosoWorkspace --name HeartbeatByIp_SRCH --search-query 'Heartbeat | where ComputerIP has "00.000.00.000"' --limit 1500 --start-search-time "2022-01-01T00:00:00.000Z" --end-search-time "2022-01-08T00:00:00.000Z" --no-wait
    

    Right now I can't even trigger the search job manually again, because then I get the error "A destination table with this name already exists".

    Before you start the job as described above, remove the old result table using an api call:

    DELETE https://management.azure.com/subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.OperationalInsights/workspaces/{workspaceName}/tables/{tableName}?api-version=2021-12-01-preview
    

    Optionally, you could check the status of the job using this api before you delete it to make sure it is not InProgress or Deleting