performancepowershellreplacefile-iopowershell-v5.1

Iterate a windows ascii text file, find all instances of {LINE2 1-9999} replace with {LINE2 "line number the code is on"}. Overwrite. Faster?


This code works. I just want to see how much faster someone can make it work.

Backup your Windows 10 batch file in case something goes wrong. Find all instances of string {LINE2 1-9999} and replace with {LINE2 "line number the code is on"}. Overwrite, encoding as ASCII.

If _61.bat is:

TITLE %TIME%   NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE   LINE2 1243
TITLE %TIME%   DOC/SET YQJ8   LINE2 1887
SET ztitle=%TIME%: WINFOLD   LINE2 2557
TITLE %TIME%   _*.* IN WINFOLD   LINE2 2597
TITLE %TIME%   %%ZDATE1%% YQJ25   LINE2 3672
TITLE %TIME%   FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 4922

Results:

TITLE %TIME%   NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE   LINE2 1
TITLE %TIME%   DOC/SET YQJ8   LINE2 2
SET ztitle=%TIME%: WINFOLD   LINE2 3
TITLE %TIME%   _*.* IN WINFOLD   LINE2 4
TITLE %TIME%   %%ZDATE1%% YQJ25   LINE2 5
TITLE %TIME%   FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 6

Code:

Copy-Item $env:windir\_61.bat -d $env:temp\_61.bat
(gc $env:windir\_61.bat) | foreach -Begin {$lc = 1} -Process {
    $_ -replace "LINE2 \d*", "LINE2 $lc";
    $lc += 1
} | Out-File -Encoding Ascii $env:windir\_61.bat

I expect this to take less than 984 milliseconds. It takes 984 milliseconds. Can you think of anything to speed it up?


Solution

  • The key to better performance in PowerShell code (short of embedding C# code compiled on demand with Add-Type, which may or may not help) is to:

    To be clear: The pipeline and cmdlets offer clear benefits, so avoiding them should only be done if optimizing performance is a must.

    In your case, the following code, which combines the switch statement with direct use of the .NET framework for file I/O seems to offer the best performance - note that the input file is read into memory as a whole, as an array of lines, and a copy of that array with the modified lines is created before it is written back to the input file:

    $file = "$env:temp\_61.bat" # must be a *full* path.
    $lc = 0
    $updatedLines = & { switch -Regex -File $file {
      '^(.*? LINE2 )\d+(.*)$' { $Matches[1] + ++$lc + $Matches[2] }
      default { ++$lc; $_ } # pass non-matching lines through
    } }
    [IO.File]::WriteAllLines($file, $updatedLines, [Text.Encoding]::ASCII)
    

    Note:

    In my tests (see below), this provided a more than 4-fold performance improvement in Windows PowerShell relative to your command.


    Here's a performance comparison via the Time-Command function:

    The commands compared are:

    Instead of a 6-line sample file, a 6,000-line file is used. 100 runs are being averaged. It's easy to adjust these parameters.

    # Sample file content (6 lines)
    $fileContent = @'
    TITLE %TIME%   NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE   LINE2 1243
    TITLE %TIME%   DOC/SET YQJ8   LINE2 1887
    SET ztitle=%TIME%: WINFOLD   LINE2 2557
    TITLE %TIME%   _*.* IN WINFOLD   LINE2 2597
    TITLE %TIME%   %%ZDATE1%% YQJ25   LINE2 3672
    TITLE %TIME%   FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 4922
    
    '@
    
    # Determine the full path to a sample file.
    # NOTE: Using the *full* path is a *must* when calling .NET methods, because
    #       the latter generally don't see the same working dir. as PowerShell.
    $file = "$PWD/test.bat"
    
    # Create the sample file with the sample content repeated N times.
    $repeatCount = 1000 # -> 6,000 lines
    [IO.File]::WriteAllText($file, $fileContent * $repeatCount)
    
    # Warm up the file cache and count the lines.
    $lineCount = [IO.File]::ReadAllLines($file).Count
    
    # Define the commands to compare as an array of scriptblocks.
    $commands =
      { # switch -Regex -File + [IO.File]::Read/WriteAllLines()
        $i = 0
        $updatedLines = & { switch -Regex -File $file {
          '^(.*? LINE2 )\d+(.*)$' { $Matches[1] + ++$i + $Matches[2] }
          default { ++$lc; $_ }
        } }
       [IO.File]::WriteAllLines($file, $updatedLines, [text.encoding]::ASCII)
      },
      { # Get-Content + -replace + Set-Content
        (Get-Content $file) | ForEach-Object -Begin { $i = 1 } -Process {
          $_ -replace "LINE2 \d*", "LINE2 $i"
          ++$i
        } | Set-Content -Encoding Ascii $file
      }
    
    # In PS Core v6.1+, also test -replace with a scriptblock operand.
    if ($PSVersionTable.PSVersion.Major -ge 6 -and $PSVersionTable.PSVersion.Minor -ge 1) {
      $commands +=
        { # -replace with scriptblock + [IO.File]::Read/WriteAllLines()
          $i = 0
          [IO.File]::WriteAllLines($file,
            ([IO.File]::ReadAllLines($file) -replace '(?<= LINE2 )\d+', { (++$i) }),
            [text.encoding]::ASCII
          )
        }
    } else {
      Write-Warning "Skipping -replace-with-scriptblock command, because it isn't supported in this PS version."
    }
    
    # How many runs to average.
    $runs = 100
    
    Write-Verbose -vb "Averaging $runs runs with a $lineCount-line file of size $('{0:N2} MB' -f ((Get-Item $file).Length / 1mb))..."
    
    Time-Command -Count $runs -ScriptBlock $commands
    

    Here are sample results from my Windows 10 machine (the absolute timings aren't important, but hopefully the relative performance show in in the Factor column is somewhat representative); the PowerShell Core version used is v6.2.0-preview.4

    # Windows 10, Windows PowerShell v5.1
    
    WARNING: Skipping -replace-with-scriptblock command, because it isn't supported in this PS version.
    VERBOSE: Averaging 100 runs with a 6000-line file of size 0.29 MB...
    
    Factor Secs (100-run avg.) Command
    ------ ------------------- -------
    1.00   0.108               # switch -Regex -File + [IO.File]::Read/WriteAllLines()...
    4.22   0.455               # Get-Content + -replace + Set-Content...
    
    
    # Windows 10, PowerShell Core v6.2.0-preview 4
    
    VERBOSE: Averaging 100 runs with a 6000-line file of size 0.29 MB...
    
    Factor Secs (100-run avg.) Command
    ------ ------------------- -------
    1.00   0.101               # switch -Regex -File + [IO.File]::Read/WriteAllLines()…
    1.67   0.169               # -replace with scriptblock + [IO.File]::Read/WriteAllLines()…
    4.98   0.503               # Get-Content + -replace + Set-Content…