When trying to export data from Splunk via the UI I am only able to download any data that takes less than 60s before the UI will timeout. How can I download all the results for a particular search?
Note: This is a self Answered Wiki Post
There is a hard limit for downloading Splunk results via the UI so you'll need to either use one of the Splunk SKD's or API. There are 2 solutions using the API:
/results
endpoint and downloading 50k results per request for a completed search./export
endpoint to stream the results as they're found.Note: These basic examples show the process to download the results using Python3 for the best case scenario. In reality you will have to error check the responses from Splunk for timeouts and do retries if there are failures on the download steps.
This approach allows you to do large searches that your Splunk instance can handle but you'll only be able to export results of 50,000 events in a single request.
Step 1: Create a search
def run_search(search_query, earliest_time='-15m', latest_time='now', allow_partial_results=False, max_count=50000):
url = f"https://splunk-customer-endpoint.splunkcloud.com:8089/services/search/jobs/{output_mode}"
headers = {
'Authorization': f'Bearer {splunk_token}'
}
data = {
'search': f'search {search_query}',
'earliest_time': earliest_time,
'latest_time': latest_time,
'allow_partial_results': allow_partial_results,
'max_count': max_count
}
response = requests.request("POST", url, headers=headers, data=data)
search_id = response.json()['sid']
return search_id
Step 2: Wait for search to finish
def get_status(search_id):
url = f"https://splunk-customer-endpoint.splunkcloud.com:8089/services/search/jobs/{search_id}"
data = {
'output_mode': 'json'
}
headers = {
'Authorization': f'Bearer {splunk_token}'
}
response = requests.request("GET", url, headers=headers, data=data)
return response.json()['entry']['content']['isDone']
Step 3: Download Results from this search with the max 50000 at a time
def get_results(search_id, output_mode='csv', offset=0, max_results_batch=50000):
url = f"https://splunk-customer-endpoint.splunkcloud.com:8089/services/search/v2/jobs/{search_id}/results/"
data = {
'output_mode': output_mode,
'count': max_results_batch,
'offset': offset
}
headers = {
'Authorization': f'Bearer {splunk_token}'
}
response = requests.request("GET", url, headers=headers, params=data)
return response
This approach will allow you to download the results as they become available before the search is complete using HTTP Stream. To download large amounts of logs using this approach you are going to want to chunk the exports into smaller time frames of for example every 15m eg. by setting earliest_time='2023-08-30 16:30:00' latest_time='2023-08-30 16:45:00'
and stream these partial results to a file.
def run_export(search_query, earliest_time='-15m', latest_time='now', output_mode='csv'):
r = requests.post('https://qualtricssplunkcloud.splunkcloud.com:8089/services/search/v2/jobs/export',
headers = {
'Authorization': f'Bearer {splunk_token}'
},
data = {
'search': f'search {search_query}',
'earliest_time': earliest_time,
'latest_time': latest_time,
},
params = {
'output_mode': output_mode
},
stream=True
)
if r.encoding is None:
r.encoding = 'utf-8'
with open('exported-search.csv', 'a') as file:
for data in r.iter_lines():
file.write(data)