regexpowershellssis

SSIS Job using PowerShell get-content -replace removing too much


I have an SSIS Package, in the control flow, I have FOR EACH FILE loop which takes each file, a CSV with " text qualifiers, and attempts to apply some changes using the EXECUTE PROCESS TASK.

What I am attempting to do is find any column in my CSV which ends in one or more semi-colons (;) and remove the semi-colon(s).

To do this I am attempting to use the get-content with a -replace flag in an argument expression. Here is my expression:

"(Get-Content '"+ @[User::Filename] + "') -replace '[;]+\"', '\"' | Out-File -encoding ASCII '"+ @[User::Filename] +"'"`

As you can see, I'm using a variable within the FOR EACH FILE loop, and have the regex '[;]+\" the escape character \ is to escape the double quote in the Expression evaluation. it then gets evaluated to this in the Arguments field:

(Get-Content '') -replace '[;]+"', '"' | Out-File -encoding ASCII ''

This process runs to completion. With the filename being added at runtime. The problem is that it is removing ALL semi-colons from the file, not just those immediately followed by double quotes.

For example, here are some pre and post examples:

Original Post-Processing Status
"IT005;;" "IT005" PASS
"390200000X;" "390200000X" PASS
"207R00000X;390200000X" "207R00000X390200000X" FAIL

Solution

  • You can match the double quote using the \x22 notation, and in order to avoid using it in the replacement pattern, you may use a positive lookahead in your regex pattern:

    "(Get-Content '"+ @[User::Filename] + "') -replace ';+(?=\\x22)' | Out-File -encoding ASCII '"+ @[User::Filename] +"'"
    

    The ;+(?=\x22) pattern matches one or more ; chars that are immediately followed with a double quotation mark.

    Since the replacement pattern is an empty string, in PowerShell, you can omit the replacement argument.