I have a vendor who sends 5-10 files a month. Recently they started sending us a mixture of Unix format files among the Windows format files. This vendor is notoriously difficult to work with so this problem will be quicker solved by myself.
I have a short Powershell 7 script that works for the Unix files but creates havoc with the Windows files:
$Files = Get-ChildItem '\\XD1\Vendor_Incoming\*.csv'
foreach($file in $Files){
(Get-Content -Raw -Path $file) -replace "`n","`r`n" | Set-Content -NoNewline -Path $file
}
After searching SO and general google searches, I have yet to find an efficient method to test each file for its file format within the foreach statement, something like:
foreach($f in $Files){ If(thisfileisUnix){Process-File} }
Thank you for your time.
Your immediate problem, just to spell it out, is that you're blindly replacing "`n"
(LF) instances with "`r`n"
(CRLF) sequences, which means that files that already have CRLF sequences are accidentally corrupted, because you're effectively turning their CRLF sequences into CRCRLF sequences ("`r`r`n"
).
Note:
A pragmatic solution that avoids this problem is to simply test for the presence of at least one "`r`n"
(CRLF sequence) in the file's content and, if not found, assume that the file uses Unix-format newlines, "`n"
(LF) only, and that the content therefore needs transforming:
Get-ChildItem '\\XD1\Vendor_Incoming\*.csv' |
ForEach-Object {
$text = $_ | Get-Content -Raw
$isUnixFormat = -not $text.Contains("`r`n")
if ($isUnixFormat) {
$text.Replace("`n", "`r`n") |
Set-Content -NoNewLine -LiteralPath $_.FullName
}
}
Note that Set-Content
uses its default character encoding, as it knows nothing about the original file's encoding, so you may have to pass an -Encoding
argument.
Here's a more robust, regex-based solution, which, however, is only necessary if there's a chance that any given file may contain a mix of LF and CRLF newlines:
Get-ChildItem '\\XD1\Vendor_Incoming\*.csv' |
ForEach-Object {
$original = $_ | Get-Content -Raw
$modified = $original -replace '(?<!\r)\n', "`r`n"
if (-not [object]::ReferenceEquals($original, $modified)) {
Set-Content -NoNewLine -LiteralPath $_.FullName -Value $modified
}
}
The above uses a regex with a negative lookbehind assertion ((?<!...)
) to match only \n
("`n"
, LF) characters not preceded by \r
("`r"
, CR) and replace them with "`r`n"
(Windows-format CRLF newlines).
It then tests whether any actual replacements were made, taking advantage of the fact that -replace
, the regular-expression-based string replacement operator, returns the input string as-is if no actual replacement was made.
Only if an actual replacement was made is the modified content written back to the input file.