I am a PowerShell noob looking for a way to find duplicate files in a directory and write the file paths of the files to a text file or csv file. My current code is working, but is extremely inefficient and slow. Any recommendations would be greatly appreciated
#Declaring the Array to store file paths and names
$arr = (get-childitem "My Path" -recurse | where {$_.extension -like '*.*'})
#creating an array to hold already found duplicate elements in order to skip over them in the iteration
$arrDupNum = -1
#Declaring for loop to iterate over the array
For ($i=0; $i -le $arr.Length - 1; $i++) {
$percent = $i / $arr.Length * 100
Write-Progress -Activity "ActivityString" -Status "StatusString" -PercentComplete $percent -CurrentOperation "CurrentOperationString"
$trigger = "f"
For ($j = $i + 1; $j -le $arr.Length - 1; $j++)
{
foreach ($num in $arrDupNum)
{
#if statement to skip over duplicates already found
if($num -eq $j -and $j -le $arr.Length - 2)
{
$j = $j + 1
}
}
if ($arr[$j].Name -eq $arr[$i].Name)
{
$trigger = "t"
Add-Content H:\Desktop\blank.txt ($arr[$j].FullName + "; " + $arr[$i].FullName)
Write-Host $arr[$i].Name
$arrDupNum += $j
}
}
#trigger used for formatting the text file in csv format
if ($trigger -eq "t")
{
Add-Content H:\Desktop\blank.txt (" " + "; " + " ")
}
}
Use a hashtable to group the files by name:
$filesByName = @{}
foreach($file in $arr){
$filesByName[$file.Name] += @($file)
}
Now we just need to find all hashtable entries with more than one file:
foreach($fileName in $filesByName.Keys){
if($filesByName[$fileName].Count -gt 1){
# Duplicates found!
$filesByName[$fileName] |Select -Expand FullName |Add-Content .\duplicates.txt
}
}
This way, when you have N
files, you'll at most iterate over them N*2
times, instead of N*N
times :)