powershell

Finding duplicate file names in Powershell


I am a PowerShell noob looking for a way to find duplicate files in a directory and write the file paths of the files to a text file or csv file. My current code is working, but is extremely inefficient and slow. Any recommendations would be greatly appreciated

#Declaring the Array to store file paths and names
$arr = (get-childitem "My Path" -recurse | where {$_.extension -like '*.*'})

#creating an array to hold already found duplicate elements in order to skip over them in the iteration
$arrDupNum = -1

#Declaring for loop to iterate over the array
For ($i=0; $i -le $arr.Length - 1; $i++) {
    $percent = $i / $arr.Length * 100
    Write-Progress -Activity "ActivityString" -Status "StatusString" -PercentComplete $percent -CurrentOperation "CurrentOperationString"
    
    $trigger = "f"
    
    For ($j = $i + 1; $j -le $arr.Length - 1; $j++)
    {
        foreach ($num in $arrDupNum)
        {
            #if statement to skip over duplicates already found
            if($num -eq $j -and $j -le $arr.Length - 2)
            {
                $j = $j + 1
            }            
        }

        if ($arr[$j].Name -eq $arr[$i].Name)
            {
                $trigger = "t"
                Add-Content H:\Desktop\blank.txt ($arr[$j].FullName + "; " + $arr[$i].FullName)
                Write-Host $arr[$i].Name
                $arrDupNum += $j
            }
    }
    #trigger used for formatting the text file in csv format
    if ($trigger -eq "t")
    {
    Add-Content H:\Desktop\blank.txt (" " + "; " + " ")
    }
}

Solution

  • Use a hashtable to group the files by name:

    $filesByName = @{}
    
    foreach($file in $arr){
        $filesByName[$file.Name] += @($file)
    }
    

    Now we just need to find all hashtable entries with more than one file:

    foreach($fileName in $filesByName.Keys){
        if($filesByName[$fileName].Count -gt 1){
            # Duplicates found!
            $filesByName[$fileName] |Select -Expand FullName |Add-Content .\duplicates.txt
        }
    }
    

    This way, when you have N files, you'll at most iterate over them N*2 times, instead of N*N times :)