arrayscsvpowershell

PowerShell finding duplicates in CSV and outputting different header


I guess the question is in the title.

I have a CSV that looks something like

user,path,original_path

I'm trying to find duplicates on the original path, then output both the user and original_path line.

This is what I have so far.

$2 = Import-Csv 'Total 20_01_16.csv' | Group-Object -Property Original_path | 
Where-Object { $_.count -ge 2 } | fl Group | out-string -width 500

This gives me the duplicates in Original_Path. I can see all the required information but I'll be danged if I know how to get to it or format it into something useful.

I did a bit of Googleing and found this script:

$ROWS = Import-CSV -Path 'Total 20_01_16.csv'
$NAMES = @{}
$OUTPUT = foreach ( $ROW in $ROWS ) { 
IF ( $NAMES.ContainsKey( $ROW.Original_path ) -and $NAMES[$ROW.original_path] -lt 2 ) 
{ $ROW }
$NAMES[$ROW.original_path] += 1 }

Write-Output $OUTPUT

I'm reluctant to use this because, well first I have no idea what it's doing. So little of the makes any sense to me, I don't like using scripts I can't get my head around. Also, and this is the more important part, it's only giving me a single duplicate, it's not giving me both sets. I'm after both offending lines, so I can find both users with the same file.

If anyone could be so kind as to lend a hand I'd appreciate it. Thanks


Solution

  • It depends on the output format you need, but to build on what you already have we can use this to show the records in the console:

    Import-Csv 'Total 20_01_16.csv' |
    Group-Object -Property Original_path |
    Where-Object { $_.count -ge 2 } |
    Foreach-Object { $_.Group } |
    Format-Table User, Path, Original_path -AutoSize
    

    Alternatively, use this to save them in a new csv-file:

    Import-Csv 'Total 20_01_16.csv' |
    Group-Object -Property Original_path |
    Where-Object { $_.count -ge 2 } |
    Foreach-Object { $_.Group } |
    Select User, Path, Original_path |
    Export-csv -Path output.csv -NoTypeInformation