splunksplunk-api

How can I download infinite results from a Splunk Search to Export the Data


When trying to export data from Splunk via the UI I am only able to download any data that takes less than 60s before the UI will timeout. How can I download all the results for a particular search?

Note: This is a self Answered Wiki Post

Solution

  • There is a hard limit for downloading Splunk results via the UI so you'll need to either use one of the Splunk SKD's or API. There are 2 solutions using the API:

    1. Using the /results endpoint and downloading 50k results per request for a completed search.
    2. Use the /export endpoint to stream the results as they're found.

    Note: These basic examples show the process to download the results using Python3 for the best case scenario. In reality you will have to error check the responses from Splunk for timeouts and do retries if there are failures on the download steps.

    Example 1: /results

    This approach allows you to do large searches that your Splunk instance can handle but you'll only be able to export results of 50,000 events in a single request.

    Step 1: Create a search

    def run_search(search_query, earliest_time='-15m', latest_time='now', allow_partial_results=False, max_count=50000):
      url = f"https://splunk-customer-endpoint.splunkcloud.com:8089/services/search/jobs/{output_mode}"
    
      headers = {
        'Authorization': f'Bearer {splunk_token}'
      }
      data = {
        'search': f'search {search_query}',
        'earliest_time': earliest_time,
        'latest_time': latest_time,
        'allow_partial_results': allow_partial_results,
        'max_count': max_count
      }
      response = requests.request("POST", url, headers=headers, data=data)
    
      search_id = response.json()['sid']
      return search_id
    

    Step 2: Wait for search to finish

    def get_status(search_id):  
        url = f"https://splunk-customer-endpoint.splunkcloud.com:8089/services/search/jobs/{search_id}"
        data = {
            'output_mode': 'json'
        }
        headers = {
            'Authorization': f'Bearer {splunk_token}'
        }
        response = requests.request("GET", url, headers=headers, data=data)
        return response.json()['entry']['content']['isDone']
    

    Step 3: Download Results from this search with the max 50000 at a time

    def get_results(search_id, output_mode='csv', offset=0, max_results_batch=50000):
        url = f"https://splunk-customer-endpoint.splunkcloud.com:8089/services/search/v2/jobs/{search_id}/results/"
        data = {
            'output_mode': output_mode,
            'count': max_results_batch,
            'offset': offset
        }
        headers = {
            'Authorization': f'Bearer {splunk_token}'
        }
        response = requests.request("GET", url, headers=headers, params=data)
        return response
    

    Example 2: /export

    This approach will allow you to download the results as they become available before the search is complete using HTTP Stream. To download large amounts of logs using this approach you are going to want to chunk the exports into smaller time frames of for example every 15m eg. by setting earliest_time='2023-08-30 16:30:00' latest_time='2023-08-30 16:45:00' and stream these partial results to a file.

    def run_export(search_query, earliest_time='-15m', latest_time='now', output_mode='csv'):
        r = requests.post('https://qualtricssplunkcloud.splunkcloud.com:8089/services/search/v2/jobs/export', 
            headers = {
                'Authorization': f'Bearer {splunk_token}'
            },
            data = {
                'search': f'search {search_query}',
                'earliest_time': earliest_time,
                'latest_time': latest_time,
            },
            params = {
                'output_mode': output_mode
            },
            stream=True
        )
    
        if r.encoding is None:
            r.encoding = 'utf-8'
    
        with open('exported-search.csv', 'a') as file:
            for data in r.iter_lines():
                file.write(data)