powershell

Delete everything between two strings (inclusive)


I have multiple html files having a tag like below:

<aside id="leftmenu" class="leftmenu menu menutheme">

- huge
- text
- in between

</aside>

I am trying to get rid of this tag entirely from all html files in a dir using this powershell script:

$directory = "C:\Users\Ajay Kumar\pathtodir\html2"    
Get-ChildItem -Path $directory -Filter "*.html" -Recurse | ForEach-Object {
    $content = Get-Content $_.FullName
    $newContent = $content -replace '\[<aside\].*?\(</aside>\)', ''
    Set-Content $_.FullName $newContent
}

I tried this too:

-replace '(<aside).*?(</aside>)', '$1$2'

Didn't work either.

What I am doing wrong?


Solution

  • You'll need to enable the (?s) flag (s modifier: single line. Dot matches newline characters), you will also need to read the content as a single string, which means, you have to use Get-Content -Raw. See https://regex101.com/r/83K5hP/1 for details.

    $directory = 'C:\Users\Ajay Kumar\pathtodir\html2'
    Get-ChildItem -Path $directory -Filter '*.html' -Recurse | ForEach-Object {
        $content = Get-Content $_.FullName -Raw
        $newContent = $content -replace '(?s)<aside.*?</aside>'
        Set-Content $_.FullName $newContent
    }