I have an SSIS Package, in the control flow, I have FOR EACH FILE loop which takes each file, a CSV with " text qualifiers, and attempts to apply some changes using the EXECUTE PROCESS TASK.
What I am attempting to do is find any column in my CSV which ends in one or more semi-colons (;) and remove the semi-colon(s).
To do this I am attempting to use the get-content
with a -replace
flag in an argument expression. Here is my expression:
"(Get-Content '"+ @[User::Filename] + "') -replace '[;]+\"', '\"' | Out-File -encoding ASCII '"+ @[User::Filename] +"'"`
As you can see, I'm using a variable within the FOR EACH FILE loop, and have the regex '[;]+\"
the escape character \
is to escape the double quote in the Expression evaluation. it then gets evaluated to this in the Arguments field:
(Get-Content '') -replace '[;]+"', '"' | Out-File -encoding ASCII ''
This process runs to completion. With the filename being added at runtime. The problem is that it is removing ALL semi-colons from the file, not just those immediately followed by double quotes.
For example, here are some pre and post examples:
Original | Post-Processing | Status |
---|---|---|
"IT005;;" | "IT005" | PASS |
"390200000X;" | "390200000X" | PASS |
"207R00000X;390200000X" | "207R00000X390200000X" | FAIL |
You can match the double quote using the \x22
notation, and in order to avoid using it in the replacement pattern, you may use a positive lookahead in your regex pattern:
"(Get-Content '"+ @[User::Filename] + "') -replace ';+(?=\\x22)' | Out-File -encoding ASCII '"+ @[User::Filename] +"'"
The ;+(?=\x22)
pattern matches one or more ;
chars that are immediately followed with a double quotation mark.
Since the replacement pattern is an empty string, in PowerShell, you can omit the replacement argument.