regexpowershellget-childitem

Powershell script: Regex to exclude commented code from string search in multiple files


I am using Get-ChildItem to recursively parse through files in a folder to find certain strings.

I need a regex to exclude all forms of commented code from the search:

/* Excluded */

/********Excluded***********/

//Excluded

/* Excluded */

$My_Regex = "(?s)(?i)(^|\s+?)(\/\*)((.)(?!\*\/))*?(StringsToBeSearched)(.*?)(\*\/)"

$Searched_Results = Get-ChildItem -Recurse $folderpath | Select-String $My_Regex

A similar question here does not help.

The search needs to be on lines which are not part of any comments. Any help?

Powershell Version V5.1.


Solution

  • While it sounds straightforward it's a bit tricky. We could apply a technique called the trash can approach to solve this with Regex. The idea is to match everything we don't want in the overall match and only the thing we want in a group that gets extracted later, e.g.

    (['"])(?:(?!\1|\\).|\\.)*\1 #discard quoted string first
    |\/\/[^\n\r]* # capture single line comments before mutli-line
    |\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/ #c-style multi-line comment
    |(StringsToBeSearched)
    

    Regex101 Demo

    Unfortunately, Search-String works line-by-line and teaching it to treat the file as a single one (e.g. using the (?s) flag seems to fail in this instance. The below code works except for the match in the proper multi-line comment.

    $rex = '(?s)"(?:(?!"|\\).|\\.)*"|\/\/[^\n\r]*|\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/|(StringsToBeSearched)'
    Get-ChildItem -Recurse "C:\temp" | Select-String $rex | Where-Object { $_.Matches[0].Groups[1].Success }
    

    So, you are basically forced to read the file contents first. And while we are at it we could make this a bit easier but removing first everything we don't want and then search for the keyword like this:

    $rex = '".*?"|(StringsToBeSearched)'
    foreach ($file in (Get-ChildItem -Recurse "C:\temp"))
    {
        $fileContent = (Get-Content $file.PSPath -Raw) -join '' 
        $fileContent = $fileContent -replace '(?s)"(?:(?!"|\\).|\\.)*"|\/\/[^\n\r]*|\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/'
        $results = $fileContent | Select-String 'StringsToBeSearched' -AllMatches
        if($results.Matches.Success){
             Write-Host $file.Name
        }
    }
    

    If you do not only want the file name but also line numbers, etc you can easily extend the logic inside of the loop. I hope this helps.