path/mydir
contains a list of directories. The names of these directories tell me which database they relate to.
Inside each directory is a bunch of files, but the filenames tell me nothing of importance.
I'm trying to write a command in linux bash that accomplishes the following:
path/mydir
, find the max timestamp of the last modified file within that directoryGiven this directory structure in path/mydir
:
database_1
table_1.file (last modified 2021-11-01)
table_2.file (last modified 2021-11-01)
table_3.file (last modified 2021-11-05)
database_2
table_1.file (last modified 2021-05-01)
table_2.file (last modified 2021-05-01)
table_3.file (last modified 2021-08-01)
database_3
table_1.file (last modified 2020-01-01)
table_2.file (last modified 2020-01-01)
table_3.file (last modified 2020-06-01)
I would want to output:
database_3 2020-06-01
database_2 2021-08-01
This half works, but looks at the modified date of the parent directory instead of the max timestamp of files under the directory:
find . -maxdepth 1 -mtime +30 -type d -ls | grep -vE 'name1|name2'
I'm very much a novice with bash, so any help and guidance is appreciated!
Would you please try the following
#!/bin/bash
cd "path/mydir/"
for d in */; do
dirname=${d%/}
mdate=$(find "$d" -maxdepth 1 -type f -mtime +30 -printf "%TY-%Tm-%Td\t%TT\t%p\n" | sort -rk1,2 | head -n 1 | cut -f1)
[[ -n $mdate ]] && echo -e "$mdate\t$dirname"
done | sort -k1,1 | sed -E $'s/^([^\t]+)\t(.+)/\\2 \\1/'
Output with the provided example:
database_3 2020-06-01
database_2 2021-08-01
for d in */; do
loops over the subdirectories in path/mydir/
.dirname=${d%/}
removes the trailing slash just for the printing purpose.printf "%TY-%Tm-%Td\t%TT\t%p\n"
prepends the modification date and time
to the filename delimited by a tab character. The result will look like:2021-08-01 12:34:56 database_2/table_3.file
sort -rk1,2
sorts the output by the date and time fields in descending order.head -n 1
picks the line with the latest timestamp.cut -f1
extracts the first field with the modification date.[[ -n $mdate ]]
skips the empty mdate
.sort -k1,1
just after done
performs the global sorting across the
outputs of the subdirectories.sed -E ...
swaps the timestamp and the dirname. It just considers
the case the dirname may contain a tab character. If not, you can
omit the sed
command by switching the order of timestamp and dirname
in the echo
command and changing the sort
command to sort -k2,2
.As for the mentioned Exclude specific directory names using regex
, add
your own logic to the find
command or whatever.
[Edit]
In order to print the directory name if the last modified file in the subdirectories is older than the specified date, please try instead:
#!/bin/bash
cd "path/mydir/"
now=$(date +%s)
for d in */; do
dirname=${d%/}
read -r secs mdate < <(find "$d" -type f -printf "%T@\t%TY-%Tm-%Td\n" | sort -nrk1,1 | head -n 1)
secs=${secs%.*}
if (( secs < now - 3600 * 24 * 30 )); then
echo -e "$secs\t$dirname $mdate"
fi
done | sort -nk1,1 | cut -f2-
now=$(date +%s)
assigns the variable now
to the current time as
the seconds since the epoch.for d in */; do
loops over the subdirectories in path/mydir/
.dirname=${d%/}
removes the trailing slash just for the printing purpose.-printf "%T@\t%TY-%Tm-%Td\n"
prints the modificaton time as seconds since
the epoch and the modification date delimited by a tab character.
The result will look like:1627743600 2021-08-01
sort -nrk1,1
sorts the output by the modification time in descending order.head -n 1
picks the line with the latest timestamp.read -r secs mdate < <( stuff )
assigns secs
and mdate
to the
outputs of the command in order.secs=${secs%.*}
removes the fractional part.(( secs < now - 3600 * 24 * 30 ))
meets if secs
is 30 days or more older than now
.echo -e "$secs\t$dirname $mdate"
prints dirname
and mdate
prepending the secs
for the sorting purpose.sort -nk1,1
just after done
performs the global sorting across the
outputs of the subdirectories.cut -f2-
removes secs
portion.