regexpowershellget-winevent

PowerShell Regex not giving the results I expect not sure why


I am trying to parse windows Security Event log Event 4624 messages for Account name and Logon Type.

I grabbed a copy of a log message saved it and tried to make it work I have been trying for a week. I am a Noob at PowerShell. Here is what I have so far.

With

$Text= Get-Content -Path 'D:\scripts\TestData\EventLog Message.txt' 
[regex]$rx ='(?<Logon> Logon Type:\S*\d+)(?<Name> Account Name:\s*\w+)'
$rx.Match($text)
 

Groups   : {0}
Success  : False
Name     : 0
Captures : {}
Index    : 0
Length   : 0
Value    :

If I take out what I am looking for I get the named Captures Fine I know it is my Regex but dammed if I can figure why.

Without

$Text= Get-Content -Path 'D:\scripts\TestData\EventLog Message.txt' 
[regex]$rx ='(?<Logon>)(?<Name>)'
$rx.Match($text)


Groups   : {0, Logon, Name}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 0
Value    : 

If I do them one at a time it works as well but I have seen it work with multiple lookups in several searches. it just doesn't work for me.

One at a time

$Text= Get-Content -Path 'D:\scripts\TestData\EventLog Message.txt' 
[regex]$rx ='(?<Logon>Logon Type:\s*\d+)'
$rx.Match($text)


Groups   : {0, Logon}
Success  : True
Name     : 0
Captures : {0}
Index    : 165
Length   : 14
Value    : Logon Type:      5

Any Ideas?


Solution

  • I'll try to explain how you can fix the pattern below, but you probably shouldn't be using regex for this in the first place - see bottom of answer for better alternatives

    There's a couple of subtle gotchas here.

    Before we walk through them I want to highlight that the second example only "works" because the pattern (?<Logon>)(?<Name>) describes two substrings of length 0 - and you can find as many empty strings in any other string (including another empty string(!)) as many times as you like. Labeling them with capture groups doesn't change the fact that 0 + 0 = 0.

    With that out of the way, let's look at what needs to be done to fix up your regex pattern for the use case at hand

    Multiline string vs array of strings

    # original
    $Text = Get-Content -Path 'D:\scripts\TestData\EventLog Message.txt' 
    
    # fixed
    $Text = Get-Content -Path 'D:\scripts\TestData\EventLog Message.txt' -Raw
    

    First gotcha: you'll want to use Get-Content -Raw when your regex is evaluating multiple lines at once - otherwise $Text is gonna be an array of individual lines, implicitly joined on regular space character when passed to Regex.Match:

    $text = 'a','b','c'
    
    $regexPattern.Match($text) # `$text` will be converted to `'a b c'`
    

    While this actually works out to your advantage in this case (we'll get back to that in a second), it's better to just explicitly do what you actually intend to do - it'll lead to fewer moments of confusion.

    is also a character literal

    By default, the .NET regex engine treats ASCII 0x20 ("space") as any other character literal - so when you construct two patterns (?<label> content) and (?<label>content), they actually describe two different strings - one that starts with a space, and one that doesn't.

    Make sure you don't litter your patterns with "ornamental" whitespace:

    # original
    [regex]$rx = '(?<Logon> Logon Type:\s*\d+)...'
    
    # fixed
    [regex]$rx = '(?<Logon>Logon Type:\s*\d+)...'
    

    I should note here that the regex engine does have support for ignoring unescaped whitespace literals in patterns:

    $rx = '(?x)                  All\ the\ leading\ whitespace\ in\ this\ patter\ is\ ignored'
    

    The regex pattern is applied to the whole input

    Regex patterns describe sequences of string contents - when you concatenate two existing patterns, the regex engine will assume that that's a reflection of the input you want to search, and will therefore look for the second pattern at the exact position in the string where the first one stops matching:

    PS ~> 'a b' -match '(?<first>a) (?<second>b)'
    True
    PS ~> 'a b c' -match '(?<first>a) (?<last>c)'
    False     # oh no, nothing accounts for `b ` now!
    

    To account for whatever content the regex engine might encounter in between, use a non-greedy (or lazy as we sometimes call them) expression like .*? - it'll consume "0 or more non-newline characters", but thanks to the lazy-qualifier ? it'll only match as few characters as possible.

    Finally, you'll need to reorder them - the message template for event id 4624 renders the subject account name before the target logon type

    # original 
    [regex]$rx ='(?<Logon>Logon Type:\S*\d+)(?<Name>Account Name:\s*\w+)'
    
    # fixed
    [regex]$rx ='(?<Name>Account Name:\s*\w+).*?(?<Logon>Logon Type:\S*\d+)'
    

    . matches everything ... except newlines

    By default, the regex engine uses the . literal to mean "any character except for \n". Since the first thing we did was fixing the Get-Content invocation to produce a multi-line string rather than an array of individual lines, we'll now have a new problem - the .*? expression we inserted to consume the in-between content will no longer be able to traverse line endings, which it'll have to, given the event log message template in question.

    To treat . as "any character including \n", use the (?s) option modifier to enable "single-line" mode:

    # original 
    [regex]$rx ='(?<Name>Account Name:\s*\w+).*?(?<Logon>Logon Type:\S*\d+)'
    
    # fixed
    [regex]$rx ='(?s)(?<Name>Account Name:\s*\w+).*?(?<Logon>Logon Type:\S*\d+)'
    

    Regex is entirely the wrong tool for the job!

    Now, the above fixes will solve the problems you've encountered so far - the pattern will match, and the capture groups will be populated correctly.

    But regex is possibly one of the worst and least efficient tools you have at your disposal for parsing event log messages.

    The event log records themselves (at least for the built-in security audit events, like EID 4624) are all backed by well-structured XML documents, and you can just ask for the specific data points you want, rather than try to siphon them out of rendered message contents:

    Let's demonstrate with a single event record:

    $eventRecord = Get-WinEvent -FilterHashtable @{ LogName = 'Security'; Id = 4624 } -MaxEvents 1
    

    From here we have a couple of options.

    We can extract the underlying XML document as-is:

    $eventXml = [xml]$eventRecord.ToXml()
    

    ... and now we can interogate it the same as any other XML document:

    $eventXml.SelectSingleNode('//*[@Name="LogonType"]').InnerText # resolves to "3"
    

    We can also enumerate all the data points 1-by-1 via the Properties member, or index into the property list, like so:

    $evt.Properties[8].Value # LogonType is the 9th EventData child node in the schema for EID 4624 
    

    (While this approach is simple and straight forward, I personally dislike it for the fact that you need to use the offset (eg. 8) rather than the associated property name, ultimately making the code less readable)

    Finally you can use $eventRecord.GetPropertyValues to specify the values you want using XPath wrapped in a so-called EventLogPropertySelector object - this is essentially doing the exact same thing as the XML approach, but without manually extracting the XML first.

    This is slightly complicated, but I personally prefer this approach because it's very descriptive, it tells you exactly what it's querying in the underlying event record - and each property query is atomic, meaning you can order them whichever way you desire:

    # define the individual XPath queries for each data point/property value
    $propertyQueries = [string[]]@(
      # we need to specify the full path to the <Data /> node here
      'Event/EventData/Data[@Name="LogonType"]',       # now LogonType can go first again
      'Event/EventData/Data[@Name="TargetUserName"]'
    )
    # create property selector from query list
    $propertySelector = [System.Diagnostics.Eventing.Reader.EventLogPropertySelector]::new($propertyQueries)
    
    # select the queried property values from $eventRecord
    $logonType,$accountName = $eventRecord.GetPropertyValues($propertySelector)
    

    In order to figure out which Name values to use for the different data points, either check out the example in the Event XML portion of the docs, or inspect the raw XML of your own event records:

    # same as before, fetch a single sample event
    $eventRecord = Get-WinEvent -FilterHashtable @{ LogName = 'Security'; Id = 4624 } -MaxEvents 1
    
    # use Select-Xml to find all <Data Name="..." /> nodes
    $dataNodes = $eventRecord.ToXml() |Select-Xml -XPath '//*[local-name() = "Data" and @Name]' |ForEach-Object Node 
    
    # grab the names and values of each
    $dataNodes |Select-Object Name,InnerText