(This is probably something rather simple I'm missing; but I can't seem to figure it out and haven't found any answers in search)
I need to compare two CSV files with the same columns and output the row differences as follows (final output in Unicode Text):
Let's say I have the following sample data:
File A:
Column1,Column2,Column3
Tommy,4133,20180204
Suzie,5200,20210112
Tammy,221,20201010
File B:
Column1,Column2,Column3
Tommy,4133,20180204
Nicky,5200,20190520
Here is my current code (borrowing the hash-enabled Compare-Object2 from this site because the delivered Compare-Object is too slow -- FYI, I'm using Get-Content instead of Import-Csv because it's a good 50-times faster since we're comparing entire row. And the MyHeader variable is just to retain the original file's header column values)
Compare-Object2 (Get-Content $FileA) (Get-Content $FileB) -PassThru |
Select-Object @{l=[string]$MyHeader;e={$_.InputObject}},
@{n='Row Label'; e={ @{'=>' = 'Bad' ; '<=' = 'Good'}[$_.SideIndicator]}},
@{n='Placeholder'; e={@{'*'='0'}['*']}} |
Sort-Object 'Row Label' -Descending | Export-Csv "$FinalCSV" -NoType;
#Removing " char to create CSV with original and added columns together
Set-Content "$FinalCSV" ((Get-Content "$FinalCSV") -replace '"');
#Convert csv to tab delimited
Import-Csv "$FinalCSV" | Export-Csv "$FinalTXT" -NoTypeInformation -Delimiter "`t";
#Remove " char and convert to unicode
Set-Content -Encoding UNICODE "$FinalTXT" ((Get-Content "$FinalTXT") -replace '"')
This works perfectly (I know some of it is redundant at the end; but hey: it's the best I could do -- but definitely feel free to fix those parts too!) to create a single output file of the Good and the Bad -- about 40 seconds for two files with 400K rows.
Result File:
Column1 Column2 Column3 Row Label Placeholder
Suzie 5200 20210112 Good 0
Tammy 221 20201010 Good 0
Nicky 5200 20210112 Bad 0
The problem is, I now need to create them as separate files: one file for the good, one for the bad. So the new needed output would be:
ResultFileGood:
Column1 Column2 Column3 Row Label Placeholder
Suzie 5200 20210112 Good 0
Tammy 221 20201010 Good 0
ResultFileBad:
Column1 Column2 Column3 Row Label Placeholder
Nicky 5200 20210112 Bad 0
And I just know there has to be a way to do it without having to run the compare twice - some use of the Where-Object prop or a loop of some sort. I just can't figure it out; so I'm coming to the experts.
Thanks
EDIT: Thanks to postanote, one viable alternative is to just output the combined file and then just split that, which is definitely faster than running the entire compare routine twice. Still would like to see if there's a way to do it directly in the comparison export without the intermediate file; but that's definitely a viable option and what I'm using for now
$FinalHeader = get-content "$FinalTXT" | Select -First 1
$BadOutput = Select-String -Path $FinalTXT -Pattern ('Bad 0')
$GoodOutput = Select-String -Path $FinalTXT -Pattern ('Good 0')
@($FinalHeader,$BadOutput.Line) | Out-File "$FinalBadTXT" -Encoding UNICODE;
@($FinalHeader,$GoodOutput.Line) | Out-File "$FinalGoodTXT" -Encoding UNICODE;
Continuing from my comment.
you've got a lot going on there; i.e., some proxy function, etc.
Mixing these items like you are, you end up with stuff like this... (very simplified of course, and since you are to showing your input you are forcing us to guess to come up with one.)
psEdit -filenames 'D:\temp\book1.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Main0,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>
psEdit -filenames 'D:\temp\book3.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
Branch3,jjj,kkk,lll
Branch4,mmm,nnn,ooo
#>
Update:
Removing all the previous stuff, since they were not your cup of tea...
;-}
Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt') |
Export-Csv -Path 'D:\Temp\CompareObject.csv' -NoTypeInformation -Force
(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\<=') -replace '.*CompareObject.*:\"|\"\,.*' |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force
(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\=>') -replace '.*CompareObject.*:\"|\"\,.*' |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force
$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'
$FileList |
ForEach-Object {
"`n********* Getting content $PSItem *********`n"
Import-Csv -Path "D:\temp\$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
********* Getting content DifferenceObject.csv *********
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
So, as for your last comment:
While that method still uses the intermediary file; I admit I completely wasn't thinking about the simple approach of just exporting the combined file and then just splitting that.***
Ok, then, without using the 'intermediary file'.
($ComparedObjects = Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt'))
# Results
<#
InputObject SideIndicator
----------- -------------
Main0,aaa,bbb,ccc <=
Branch3,jjj,kkk,lll =>
Branch4,mmm,nnn,ooo =>
#>
($ComparedObjects -match '<=').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
#>
($ComparedObjects -match '=>').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
Then a just export to csv.
($ComparedObjects -match '<=').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force
($ComparedObjects -match '=>').InputObject |
ConvertFrom-Csv -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force
Read back as needed
$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'
$FileList |
ForEach-Object {
"`n********* Getting content $PSItem *********`n"
Import-Csv -Path "D:\temp\$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
********* Getting content DifferenceObject.csv *********
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
Update
Per your comment --
'the problem is the final output need: the Unicode Tab-delimited text with the additional columns.'
(($ComparedObjects -match '<=').InputObject) -replace ',', "`t" |
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\ReferenceObject.csv'
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Main0 aaa bbb ccc
#>
(($ComparedObjects -match '=>').InputObject) -replace ',', "`t" |
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\DifferenceObject.csv'
# Results
<#
Site Dept Office Floor
---- ---- ------ -----
Branch3 jjj kkk lll
Branch4 mmm nnn ooo
#>
Or for the extra column stuff, you could do this...
$ComparedObjects -match '<=' |
Select-Object -Property @{
Name = 'Site'
Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
Name = 'Dept'
Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
Name = 'Office'
Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
Name = 'Floor'
Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
Name = 'Label'
Expression = {'Good'}
},
@{
Name = 'Placeholder'
Expression = {0}
} |
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\ReferenceObject.csv') -replace '"','' -replace ',', "`t" |
Set-Content -PassThru 'D:\temp\ReferenceObject.csv'
Import-Csv -Path 'D:\temp\ReferenceObject.csv' -Delimiter "`t" |
Format-Table -AutoSize
# Results
<#
Site Dept Office Floor Label Placeholder
---- ---- ------ ----- ----- -----------
Main0 aaa bbb ccc Good 0
#>
$ComparedObjects -match '=>' |
Select-Object -Property @{
Name = 'Site'
Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
Name = 'Dept'
Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
Name = 'Office'
Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
Name = 'Floor'
Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
Name = 'Label'
Expression = {'Good'}
},
@{
Name = 'Placeholder'
Expression = {0}
} |
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\DifferenceObject.csv') -replace '"','' -replace ',', "`t" |
Set-Content -PassThru 'D:\temp\DifferenceObject.csv'
Import-Csv -Path 'D:\temp\DifferenceObject.csv' -Delimiter "`t" |
Format-Table -AutoSize
# Results
<#
Site Dept Office Floor Label Placeholder
---- ---- ------ ----- ----- -----------
Branch3 jjj kkk lll Good 0
Branch4 mmm nnn ooo Good 0
#>