I need to find every duplicate filenames in a given directory tree. I don't know, what directory tree user will give as a script argument, so I don't know the directory hierarchy. I tried this:
#!/bin/sh
find -type f | while IFS= read vo
do
echo `basename "$vo"`
done
but that's not really what I want. It finds only one duplicate and then ends, even, if there are more duplicate filenames, also - it doesn't print a whole path (prints only a filename) and duplicate count. I wanted to do something similar to this command:
find DIRNAME | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "
but it doesn't work for me, don't know why. Even if I have a duplicates, it prints nothing.
Here is another solution (based on the suggestion by @jim-mcnamara) without awk:
Solution 1
#!/bin/sh
dirname=/path/to/directory
find $dirname -type f | sed 's_.*/__' | sort| uniq -d|
while read fileName
do
find $dirname -type f | grep "$fileName"
done
However, you have to do the same search twice. This can become very slow if you have to search a lot of data. Saving the "find" results in a temporary file might give a better performance.
Solution 2 (with temporary file)
#!/bin/sh
dirname=/path/to/directory
tempfile=myTempfileName
find $dirname -type f > $tempfile
cat $tempfile | sed 's_.*/__' | sort | uniq -d|
while read fileName
do
grep "/$fileName" $tempfile
done
#rm -f $tempfile
Since you might not want to write a temp file on the harddrive in some cases, you can choose the method which fits your needs. Both examples print out the full path of the file.
Bonus question here: Is it possible to save the whole output of the find command as a list to a variable?