I'm currently trying to build a pipeline to process Windows .evtx binary logs into usable JSON for easy searching and filtering on Linux, but due to going through xml, there is data that needs to be cleaned up with jq. The bash pipeline is currently this:
./evtx_dump.py <File.evtx> \
| sed -e 's/version="1.1"/version="1.0"/g' \
| yq -p xml -o json \
| sed -e 's/+@//g' -e 's/+//g' \
| jq '.Events' > <File.json>
When processing the object through the pipeline above, it results in this output:
{
"xmlns": "http://schemas.microsoft.com/win/2004/08/events/event",
"System": {
"Provider": {
"Name": "Microsoft-Windows-Security-Auditing",
"Guid": "{<GUID>}"
},
"EventID": {
<EVENT-ID data>
},
"EventData": {
"Data": [
{
"content": "S-<SID>",
"Name": "SubjectUserSid"
},
{
"content": "-",
"Name": "SubjectUserName"
},
{
"content": "-",
"Name": "SubjectDomainName"
},
{
"content": "<HEX>",
"Name": "SubjectLogonId"
},
{
"content": "<HEX>",
"Name": "NewProcessId"
},
{
"content": "Registry",
"Name": "NewProcessName"
},
{
"content": "%%2000",
"Name": "TokenElevationType"
},
{
"content": "0x0000000000000019",
"Name": "ProcessId"
},
{
"Name": "CommandLine"
},
{
"content": "<HEX>",
"Name": "TargetUserSid"
},
{
"content": "-",
"Name": "TargetUserName"
},
{
...
]
Everything under .Event[].EventData.Data[] seems to have been organised under an array due to the conversion through XML - some of the array values also seem to have no .content (presumably because the presence of those data points indicates the result is TRUE). I'm able to clean up data a little with a jq filter like this: .Event[].EventData.Data | map( { (.Name|tostring):(.content|tostring) } ) | add
, but that only seems to work for one event and it throws errors due to the null values. I can't expand the scope of the filter to this: .Event[] | map( { (.EventData.Data.Name|tostring):(.EventData.Data.content|tostring) } ) | add
, but that approximates the outcome I'm going for.
You really should rewrite it all to use yq. All this preprocessing with SED and removing the attribute prefixes are rather pointless, yq can do that for you if you allowed it to.
If you don't want your attributes prefixed with +@
, specify an empty prefix using the --xml-attribute-prefix
switch. Element content is saved under the "+content"
property in json, you could change that name to simply "content" using --xml-content-name
. Check the docs.
e.g.,
$ cat input.xml
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-Security-Auditing">
<Guid>{<GUID>}</Guid>
</Provider>
</System>
<EventData>
<Data Name="SubjectUserSid">S-<SID></Data>
<Data Name="SubjectUserName">-</Data>
<Data Name="SubjectDomainName">-</Data>
<Data Name="SubjectLogonId"><HEX></Data>
<Data Name="NewProcessId"><HEX></Data>
<Data Name="NewProcessName">Registry</Data>
<Data Name="TokenElevationType">%%2000</Data>
<Data Name="ProcessId">0x0000000000000019</Data>
<Data Name="CommandLine"></Data>
<Data Name="TargetUserSid"><HEX></Data>
<Data Name="TargetUserName">-</Data>
</EventData>
</Event>
$ yq -p xml -o json --xml-attribute-prefix '' --xml-content-name Content input.xml
{
"Event": {
"xmlns": "http://schemas.microsoft.com/win/2004/08/events/event",
"System": {
"Provider": {
"Name": "Microsoft-Windows-Security-Auditing",
"Guid": "{<GUID>}"
}
},
"EventData": {
"Data": [
{
"Content": "S-<SID>",
"Name": "SubjectUserSid"
},
{
"Content": "-",
"Name": "SubjectUserName"
},
{
"Content": "-",
"Name": "SubjectDomainName"
},
{
"Content": "<HEX>",
"Name": "SubjectLogonId"
},
{
"Content": "<HEX>",
"Name": "NewProcessId"
},
{
"Content": "Registry",
"Name": "NewProcessName"
},
{
"Content": "%%2000",
"Name": "TokenElevationType"
},
{
"Content": "0x0000000000000019",
"Name": "ProcessId"
},
{
"Name": "CommandLine"
},
{
"Content": "<HEX>",
"Name": "TargetUserSid"
},
{
"Content": "-",
"Name": "TargetUserName"
}
]
}
}
}
Then from there, apply your filters like you would with jq
but in yq
. It's mostly the same all around.
I guess you just wanted to put all the data elements as a dictionary?
$ yq -p xml -o json --xml-attribute-prefix '' --xml-content-name Content '
.Event.EventData.Data |= (map({"key":.Name, "value":.Content}) | from_entries)
' input.xml
{
"Event": {
"xmlns": "http://schemas.microsoft.com/win/2004/08/events/event",
"System": {
"Provider": {
"Name": "Microsoft-Windows-Security-Auditing",
"Guid": "{<GUID>}"
}
},
"EventData": {
"Data": {
"SubjectUserSid": "S-<SID>",
"SubjectUserName": "-",
"SubjectDomainName": "-",
"SubjectLogonId": "<HEX>",
"NewProcessId": "<HEX>",
"NewProcessName": "Registry",
"TokenElevationType": "%%2000",
"ProcessId": "0x0000000000000019",
"CommandLine": null,
"TargetUserSid": "<HEX>",
"TargetUserName": "-"
}
}
}
}