bashawkfind

Count files in subdirectories and list the deepest directories


I want a list of directories with the amount of mp3 files in it; then sorted desc - simply to see, which directories contain the most files.


My command

## Relevant command
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
  awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf(n[i]" "i" \n")};}' > ./foo.txt

sort -rno ./foo.txt ./foo.txt
## Full command (output improvements only)
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
  awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf("%03d",n[i]);printf("   ");printf(substr(i,0,60));printf("\n")}; if(length(n)==0) print "NO mp3 found." }' > ./foo.txt
sort -rno ./foo.txt ./foo.txt

Directory structure

./dir_1/fileA.mp3
./dir_2/subdir_1/fileB.mp3
./dir_2/subdir_2/fileC.mp3
./dir_2/subdir_2/fileD.mp3
...

Output

# What I get:
003  dir_2
001  dir_1

# What I want:
002  dir_2/subdir_2
001  dir_2/subdir_1
001  dir_1

The Problem

It only prints the topmost directories, not the deepest possible. It sums up the mp3 count of subdirs.

I cant increase -mindepth because the depth varies.


It would be okay to have both, like this:

003  dir_2
002  dir_2/subdir_2
001  dir_2/subdir_1
001  dir_1

I tried the find -links 2 argument but it only works for -type d not -type f.


Solution

  • Setup:

    mkdir -p dir_{1..2} dir_2/subdir_{1..2}
    touch ./dir_1/fileA.mp3 ./dir_2/subdir_1/fileB.mp3 ./dir_2/subdir_2/file{C,D}.mp3
    

    One awk approach using OP's \0 terminated filenames:

     find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
     awk -vRS='\0' '
     { match($0,/\/[^/]+$/)                      # find last "/" plus file name
       count[substr($0,1,RSTART-1)]++            # strip off directory name(s) and use an index in count[] array
     }
     END {
       if (NR==0)
          print "NO mp3 found."
       else
          for (dir in count)
              printf "%03d %s\n",count[dir],dir
     }'
    

    This generates:

    001 dir_1
    001 dir_2/subdir_1
    002 dir_2/subdir_2
    

    Piping the output to sort -rn generates:

    002 dir_2/subdir_2
    001 dir_2/subdir_1
    001 dir_1
    

    If we remove all mp3 files and run again this generates:

    NO mp3 found.