searchpowershelloptimizer-hints

Optimizing simple search script in PowerShell


I need to create a script to search through just below a million files of text, code, etc. to find matches and then output all hits on a particular string pattern to a CSV file.

So far I made this;

$location = 'C:\Work*'

$arr = "foo", "bar" #Where "foo" and "bar" are string patterns I want to search for (separately)

for($i=0;$i -lt $arr.length; $i++) {
Get-ChildItem $location -recurse | select-string -pattern $($arr[$i]) | select-object Path | Export-Csv "C:\Work\Results\$($arr[$i]).txt"
}

This returns to me a CSV file named "foo.txt" with a list of all files with the word "foo" in it, and a file named "bar.txt" with a list of all files containing the word "bar".

Is there any way anyone can think of to optimize this script to make it work faster? Or ideas on how to make an entirely different, but equivalent script that just works faster?

All input appreciated!


Solution

  • If your files are not huge and can be read into memory then this version should work quite faster (and my quick and dirty local test seems to prove that):

    $location = 'C:\ROM'
    $arr = "Roman", "Kuzmin"
    
    # remove output files
    foreach($test in $arr) {
        Remove-Item ".\$test.txt" -ErrorAction 0 -Confirm
    }
    
    Get-ChildItem $location -Recurse | .{process{ if (!$_.PSIsContainer) {
        # read all text once
        $content = [System.IO.File]::ReadAllText($_.FullName)
        # test patterns and output paths once
        foreach($test in $arr) {
            if ($content -match $test) {
                $_.FullName >> ".\$test.txt"
            }
        }
    }}}
    

    Notes: 1) mind changed paths and patterns in the example; 2) output files are not CSV but plain text; there is not much reason in CSV if you are interested just in paths - plain text files one path per line will do.