I want to create a query to fetch all the lookups that I have in my Splunk system (or even a predefined list) and their full scheme.
It should include the headers and the types for each column.
I tried multiple options but none of them worked.
any ideas?
Thanks
TL:DR this is the SPL:
| rest/servicesNS/-/-/data/lookup-table-files
| fields title
| regex "title"="\.csv$"
| map search="| inputlookup $title$| head 100| foreach mode=multifield * [ eval <<FIELD>>=typeof(<<FIELD>>)]| stats count by [| inputlookup $title$| head 1| transpose| stats values(column) AS column| nomv column | return $column]| foreach mode=multifield * [ eval <<FIELD>>#####{<<FIELD>>}=count]| fields *####*| fields - count*| stats sum(*) AS *| transpose| rename \"row 1\" AS count| rex field=\"column\" \"^(?P<column_name>[^#]+)#####(?P<type>.*)$\"| eventstats max(count) AS most_frequent BY column_name| where count=most_frequent| eval structure=column_name.\":\".'type'| stats values(structure) AS structure| eval name=$title$" maxsearches=1000
Okay, so this is kind of a hacky solution :) I will try to walk through it step-by-step. There might be some SPL optimizations to be done, but for now I am satisfied with a sort of straight forward and robust solution.
| rest/servicesNS/-/-/data/lookup-table-files
: Get a list of all lookup files on the system| fields title
: Look only into the title attribute| regex "title"="\.csv$"
: Filter for CSV Lookups.| map ... maxsearch=1000
: Do something to get most frequent datatype of all columns for each lookup. maxsearches
stipulates how many lookups will be looked at. 1000 should work for most cases, I guess. See doku of map.| inputlookup $title$
: Load the lookup
| foreach mode=multifield * [eval <<FIELD>>=typeof(<<FIELD>>)]
: For each datapoint get the data type in each field. You could tune this to get more speciffic.
| stats count by [...]
: Get a count of how often the type configurations happen for all fields. Subsearch will provide all fields in lookup.
| inputlookup $title$
: Load the lookup| head 1
: take thre first row| transpose
: transpose to get all fieldnames into "column" field| stats values(column) AS column
: create multivalue field of all column names| nomv column
: transform multivalue to string| return $column]
: paste that string as result of subsearch| foreach mode=multifield * [eval <<FIELD>>#####{<<FIELD>>}=count]
: Loop through all fields and create a dummy variable that saves whicht field type apears how often
| fields *####*| fields - count*
: Delete raw fieldnames as well as byproducts with the count field.
| stats sum(*) AS *
: Get the sum for each field-datatype pair
| transpose
: transpose the data to find the most freqent data-type
| rename "row 1" AS count
| rex field="column" "^(?P<column_name>[^#]+)#####(?P<type>.*)$"
: Parse the combinations into the components name and type
| eventstats max(count) AS most_frequent BY column_name
: For each field in the lookup find the count of the most common type
| where count=most_frequent
: Filter to keep only those rows
| eval structure=column_name.":".'type'
: Create readable information of column and type
| stats values(structure) AS structure
: Get as multivalue field, since you can only return a single row with map
| eval name=$title$
: Ensure the title field of original search is kept after map