[SOLVED] Getting Splunk's Lookups' scheme in a single query

Getting Splunk's Lookups' scheme in a single query

I want to create a query to fetch all the lookups that I have in my Splunk system (or even a predefined list) and their full scheme.

It should include the headers and the types for each column.

I tried multiple options but none of them worked.

any ideas?

Thanks

Solution

TL:DR this is the SPL:

| rest/servicesNS/-/-/data/lookup-table-files
| fields title
| regex "title"="\.csv$"
| map search="| inputlookup $title$| head 100| foreach mode=multifield * [    eval <<FIELD>>=typeof(<<FIELD>>)]| stats count by [| inputlookup $title$| head 1| transpose| stats values(column) AS column| nomv column | return $column]| foreach mode=multifield * [    eval <<FIELD>>#####{<<FIELD>>}=count]| fields *####*| fields - count*| stats sum(*) AS *| transpose| rename \"row 1\" AS count| rex field=\"column\" \"^(?P<column_name>[^#]+)#####(?P<type>.*)$\"| eventstats max(count) AS most_frequent BY column_name| where count=most_frequent| eval structure=column_name.\":\".'type'| stats values(structure) AS structure| eval name=$title$" maxsearches=1000

Okay, so this is kind of a hacky solution :) I will try to walk through it step-by-step. There might be some SPL optimizations to be done, but for now I am satisfied with a sort of straight forward and robust solution.

Find all CSV Lookup Files

| rest/servicesNS/-/-/data/lookup-table-files: Get a list of all lookup files on the system
| fields title: Look only into the title attribute
| regex "title"="\.csv$": Filter for CSV Lookups.
| map ... maxsearch=1000: Do something to get most frequent datatype of all columns for each lookup. maxsearches stipulates how many lookups will be looked at. 1000 should work for most cases, I guess. See doku of map.

For each Lookup get column name and most frequent data type per column.

| inputlookup $title$: Load the lookup
| foreach mode=multifield * [eval <<FIELD>>=typeof(<<FIELD>>)]: For each datapoint get the data type in each field. You could tune this to get more speciffic.
| stats count by [...]: Get a count of how often the type configurations happen for all fields. Subsearch will provide all fields in lookup.
- | inputlookup $title$: Load the lookup
- | head 1: take thre first row
- | transpose: transpose to get all fieldnames into "column" field
- | stats values(column) AS column: create multivalue field of all column names
- | nomv column: transform multivalue to string
- | return $column]: paste that string as result of subsearch
| foreach mode=multifield * [eval <<FIELD>>#####{<<FIELD>>}=count]: Loop through all fields and create a dummy variable that saves whicht field type apears how often
| fields *####*| fields - count*: Delete raw fieldnames as well as byproducts with the count field.
| stats sum(*) AS *: Get the sum for each field-datatype pair
| transpose: transpose the data to find the most freqent data-type
| rename "row 1" AS count
| rex field="column" "^(?P<column_name>[^#]+)#####(?P<type>.*)$": Parse the combinations into the components name and type
| eventstats max(count) AS most_frequent BY column_name: For each field in the lookup find the count of the most common type
| where count=most_frequent: Filter to keep only those rows
| eval structure=column_name.":".'type': Create readable information of column and type
| stats values(structure) AS structure: Get as multivalue field, since you can only return a single row with map
| eval name=$title$: Ensure the title field of original search is kept after map