bashshellfiletimestampfilemtime

Linux - Finding the max modified date of each set of files in each directory


path/mydir contains a list of directories. The names of these directories tell me which database they relate to.

Inside each directory is a bunch of files, but the filenames tell me nothing of importance.

I'm trying to write a command in linux bash that accomplishes the following:

Given this directory structure in path/mydir:

database_1
   table_1.file (last modified 2021-11-01)
   table_2.file (last modified 2021-11-01)
   table_3.file (last modified 2021-11-05)
database_2
   table_1.file (last modified 2021-05-01)
   table_2.file (last modified 2021-05-01)
   table_3.file (last modified 2021-08-01)
database_3
   table_1.file (last modified 2020-01-01)
   table_2.file (last modified 2020-01-01)
   table_3.file (last modified 2020-06-01)

I would want to output:

database_3 2020-06-01
database_2 2021-08-01

This half works, but looks at the modified date of the parent directory instead of the max timestamp of files under the directory: find . -maxdepth 1 -mtime +30 -type d -ls | grep -vE 'name1|name2'

I'm very much a novice with bash, so any help and guidance is appreciated!


Solution

  • Would you please try the following

    #!/bin/bash
    
    cd "path/mydir/"
    for d in */; do
        dirname=${d%/}
        mdate=$(find "$d" -maxdepth 1 -type f -mtime +30 -printf "%TY-%Tm-%Td\t%TT\t%p\n" | sort -rk1,2 | head -n 1 | cut -f1)
        [[ -n $mdate ]] && echo -e "$mdate\t$dirname"
    done | sort -k1,1 | sed -E $'s/^([^\t]+)\t(.+)/\\2 \\1/'
    

    Output with the provided example:

    database_3 2020-06-01
    database_2 2021-08-01
    
    2021-08-01      12:34:56        database_2/table_3.file
    

    As for the mentioned Exclude specific directory names using regex, add your own logic to the find command or whatever.

    [Edit]
    In order to print the directory name if the last modified file in the subdirectories is older than the specified date, please try instead:

    #!/bin/bash
    
    cd "path/mydir/"
    now=$(date +%s)
    for d in */; do
        dirname=${d%/}
        read -r secs mdate < <(find "$d" -type f -printf "%T@\t%TY-%Tm-%Td\n" | sort -nrk1,1 | head -n 1)
        secs=${secs%.*}
        if (( secs < now - 3600 * 24 * 30 )); then
            echo -e "$secs\t$dirname $mdate"
        fi
    done | sort -nk1,1 | cut -f2-
    
    1627743600      2021-08-01