In Windows, a report from MS SQL server comes to me with embedded duplicate line feeds that cause errors when reading into LOAD DATA INFILE in mySQL in a Windows environment. I fix this using three Strawberry Perl one-liners in a Powershell script.
The Windows '\r\n' convention is recognized by Strawberry Perl and that is its output format. I could not get Perl to recognize the empty lines with a single 0x0a linefeed, though, as the Strawberry flavor of Perl would always interpret that as a new line and include it in the output.
My solution was to simply replace all the linefeed 0x0a characters with a string unlikely to appear in the content:
<* script is Powershell, initialize source file name*>
$myLoginCSV = 'User%5FProfile%5FReport.csv'
<* convert linefeed (0a) chars into string LINEFEEDCHAR *>
$perlarg='-i.p.bak -p -e "s/\x0A/LINEFEEDCHAR/g"'
perl $perlarg $myLoginCSV
<* de-duplicate linefeeds*>
$perlarg='-i.p.bak -p -e "s/(LINEFEEDCHAR){2,}/LINEFEEDCHAR/g"'
perl $perlarg $myLoginCSV
<* restore the 0a linefeed characters by substituting for LINEFEEDCHAR *>
$perlarg='-i.p.bak -p -e "s/LINEFEEDCHAR/\x0a/g"'
perl $perlarg $myLoginCSV
I tried to do this in one line, but could not figure out how. Strawberry Perl interprets linefeeds with a fixed behavior to make it compatible with Windows. This "three liner" works well and is fast. An attempt at this using Powershell native regex was over 10 times slower.
Specifically in the Strawberry Perl Windows environment, is there a way to do this in fewer than three lines?
This is not a high priority. One lesson I took away from this is that a regex solution using Strawberry Perl in Windows can have a multi line solution but still be fast. In the future, I won't be as wedded to a true "one-line" solution when a couple of extra lines get the job done.
Slurp and replace consecutive newlines with a single.
perl -0777 -i.p.bak -lpe "s/\R+/\n/g" User%5FProfile%5FReport.csv