powershellfindextractgetcontent

PowerShell Extract text between two strings with -Tail and -Wait


I have a text file with a large number of log messages. I want to extract the messages between two string patterns. I want the extracted message to appear as it is in the text file.

I tried the following methods. It works, but doesn't support Get-Content's -Wait and -Tail options. Also, the extracted results are displayed in one line, but not like the text file. Inputs are welcome :-)

Sample Code

function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){

    # Get content from the input file
    $fileContent = Get-Content $filePath

    # Regular expression (Regex) of the given start and end patterns
    $pattern = "$startPattern(.*?)$endPattern"

    # Perform the Regex opperation
    $result = [regex]::Match($fileContent,$pattern).Value

    # Finally return the result to the caller
    return $result
}

# Clear the screen
Clear-Host

$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'

# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input

Improved script based on Theo's answer. The following points need to be improved:

  1. The beginning and end of the output is somehow trimmed despite I adjusted the buffer size in the script.
  2. How to wrap each matched result into START and END string?
  3. Still I could not figure out how to use the -Wait and -Tail options

Updated Script

# Clear the screen
Clear-Host

# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
  [console]::bufferwidth = $bw
  [console]::bufferheight = $bh
}
else
{
    $pshost = get-host
    $pswindow = $pshost.ui.rawui
    $newsize = $pswindow.buffersize
    $newsize.height = $bh
    $newsize.width = $bw
    $pswindow.buffersize = $newsize
}


function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
    # Get content from the input file
    $fileContent = Get-Content -Path $filePath -Raw
    # Regular expression (Regex) of the given start and end patterns
    $pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
    # Perform the Regex operation and output
    [regex]::Match($fileContent,$pattern).Groups[1].Value
}

# Input file path
 $inputFile = "THE-LOG-FILE.log"

# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'


Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile

Solution

  • Here's a proof of concept, but note the following:

    # Note the use of "-" after "Get", to adhere to PowerShell's
    # "<Verb>-<Noun>" naming convention.
    function Get-TextBetweenTwoStrings {
    
      # Make the function an advanced one, so that it supports the 
      # -OutVariable common parameter.
      [CmdletBinding()]
      param(
        $startPattern, 
        $endPattern, 
        $filePath
      )
    
      # Note: If $startPattern and $endPattern are themselves
      #       regexes, omit the [regex]::Escape() calls.
      $startRegex = '.*' + [regex]::Escape($startPattern) + '.*'
      $endRegex = '.*' + [regex]::Escape($endPattern) + '.*'
    
      $inBlock = $false
      $block = [System.Collections.Generic.List[string]]::new()
    
      Get-Content -Tail 100 -Wait $filePath | ForEach-Object {
        if ($inBlock) {
          if ($_ -match $endRegex) {
            $block.Add($Matches[0])
            # Output the block of lines as a single, multi-line string
            $block -join "`n"
            $inBlock = $false; $block.Clear()       
          }
          else {
            $block.Add($_)
          }
        }
        elseif ($_ -match $startRegex) {
          $inBlock = $true
          $block.Add($Matches[0])
        }
      }
    
    }