I'm trying to apply a complex logic for removing old backups. Ideally, I'd like to only have backups up to 3 weeks (up to 3 weeklies, 7 dailies +-).
I'd like to rely on the filename for dating the file instead of actual creation date if possible.
This is how the example files look like:
Backup_2023-07-01.bak Backup_2023-07-13.bak Backup_2023-07-25.bak Backup_2023-08-06.bak Backup_2023-08-18.bak Backup_2023-08-30.bak Backup_2023-09-11.bak Backup_2023-09-23.bak
Backup_2023-07-02.bak Backup_2023-07-14.bak Backup_2023-07-26.bak Backup_2023-08-07.bak Backup_2023-08-19.bak Backup_2023-08-31.bak Backup_2023-09-12.bak Backup_2023-09-24.bak
Backup_2023-07-03.bak Backup_2023-07-15.bak Backup_2023-07-27.bak Backup_2023-08-08.bak Backup_2023-08-20.bak Backup_2023-09-01.bak Backup_2023-09-13.bak Backup_2023-09-25.bak
Backup_2023-07-04.bak Backup_2023-07-16.bak Backup_2023-07-28.bak Backup_2023-08-09.bak Backup_2023-08-21.bak Backup_2023-09-02.bak Backup_2023-09-14.bak Backup_2023-09-26.bak
Backup_2023-07-05.bak Backup_2023-07-17.bak Backup_2023-07-29.bak Backup_2023-08-10.bak Backup_2023-08-22.bak Backup_2023-09-03.bak Backup_2023-09-15.bak Backup_2023-09-27.bak
Backup_2023-07-06.bak Backup_2023-07-18.bak Backup_2023-07-30.bak Backup_2023-08-11.bak Backup_2023-08-23.bak Backup_2023-09-04.bak Backup_2023-09-16.bak Backup_2023-09-28.bak
Backup_2023-07-07.bak Backup_2023-07-19.bak Backup_2023-07-31.bak Backup_2023-08-12.bak Backup_2023-08-24.bak Backup_2023-09-05.bak Backup_2023-09-17.bak Backup_2023-09-29.bak
Backup_2023-07-08.bak Backup_2023-07-20.bak Backup_2023-08-01.bak Backup_2023-08-13.bak Backup_2023-08-25.bak Backup_2023-09-06.bak Backup_2023-09-18.bak Backup_2023-09-30.bak
Backup_2023-07-09.bak Backup_2023-07-21.bak Backup_2023-08-02.bak Backup_2023-08-14.bak Backup_2023-08-26.bak Backup_2023-09-07.bak Backup_2023-09-19.bak Backup_2023-09-31.bak
Backup_2023-07-10.bak Backup_2023-07-22.bak Backup_2023-08-03.bak Backup_2023-08-15.bak Backup_2023-08-27.bak Backup_2023-09-08.bak Backup_2023-09-20.bak Backup_KW1.bak
Backup_2023-07-11.bak Backup_2023-07-23.bak Backup_2023-08-04.bak Backup_2023-08-16.bak Backup_2023-08-28.bak Backup_2023-09-09.bak Backup_2023-09-21.bak
Backup_2023-07-12.bak Backup_2023-07-24.bak Backup_2023-08-05.bak Backup_2023-08-17.bak Backup_2023-08-29.bak Backup_2023-09-10.bak Backup_2023-09-22.bak
This is my script so far. The errors I get are not helping me understand, sadly.
#! /bin/bash
for filename in ./testfiles/*.bak; do
thisfiledate=$("$filename" | cut -d "_" -f2 | cut -d "." -f1)
echo $thisfiledate
#rename files made on sundays to weekly backups, dont delete them yet
if (date -d $thisfiledate +%u)=7;
then mv $filename "Backup_KW" date -d thisfiledate +%U ".bak"
#remove files older than one week
elseif $thisfiledate<date +%F -d "-7 days" then rm #$filename
#remove weekly files older than 3 weeks
elseif $($filename | cud -d "KW" -f2 | cut -d "." -f1)<date -d "-3 weeks" +%U then rm #$filename
fi
done
The problem here is multi-faceted. I'm new to cut
and date
but otherwise have used bash in my backup scripts successfully. It's hard to find answers since there are a gazillion ways to do this.
I'm trying to reduce the amount of cans of worms to open just yet (such as regex, which I can't wrap my head around).
Please help! Greatly appreciated.
Running it in ShellCheck.net usually helps, but you need to know some syntax to read the suggestions...
Line 6:
if (date -d $thisfiledate +%u)=7;
^-- SC1073 (error): Couldn't parse this if expression. Fix to allow more checks.
^-- SC1050 (error): Expected 'then'.
^-- SC1072 (error): Expected 'then'. Fix any mentioned problems and try again.
^-- SC1141 (error): Unexpected tokens after compound command. Bad redirection or missing ;/&&/||/|?
This isn't clear, and the man page doesn't help much.
Once you fix these, more errors will show; as pointed out above, I suspect you meant cut
instead of cud
. cut
will complain about -d "KW"
- it can't use multi-character delimiters. We can do better, though; we'll refactor while we're fixing the bugs.
if
SyntaxThe basic structure of an if
statement in bash
is:
if LIST;
then LIST;
[elif LISTA1; then LISTA2; [elif LISTN1; then LISTN2;] ]
[else LISTX;]
fi
LIST is a valid "command pipeline". elif
and else
are optional, and elif
can be repeated.
Conditionals check return values, and zero means success, while any nonzero is failure. ALWAYS keep that in mind.
Your command -
if (date -d $thisfiledate +%u)=7; then ...
The (
opens a subshell, running the commands inside in a forked environment. I'm pretty sure this is not what you meant.
The =
makes no sense in this context. Its left-hand side is not an assignable token, it isn't inside a comparison construct,and it isn't a redirection like <
or >
or a pipe (|
), or a conditional operator like &&
or ||
, or whitespace, or a semicolon... it's none of the things that it can figure out, so the parser bails and tells you that you did it wrong.
All of this is because what you meant to do, a conditional comparison, requires a different syntax. There are a few.
When you run any command LIST it returns a status to the shell, and that's what is checked in the if
.
$: date -d 2023-10-10 +%u
2
$: echo $?
0
$: if date -d 2023-10-10 +%u # The `2` output IS NOT TESTED HERE.
then echo ok; else echo no; fi
2
ok
If you want a test on the output you have to construct your test correctly. There are several ways to accomplish this; let's capture that data separately to make the point.
$: weekday=$(date -d 2023-10-10 +%u)
$: echo $weekday
2
The success of the date
command isn't explicitly checked here, but if it fails there will be nothing in weekday
, so that's often considered good enough. Be prepared.
$: weekday=$(date -d 'foo bar baz' +%u)
date: invalid date ‘foo bar baz’
$: echo $weekday
You used this $(cmd)
construct in your code, though yours is broken:
$: thisfiledate=$("$filename" | cut -d "_" -f2 | cut -d "." -f1)
bash: Backup_KW1.bak: command not found
(You meant to work with the value in the variable as a string, but executed it as a command. You needed something like echo "$filename"
- we'll come back to that.)
Testing the output is a separate step. You were trying to use a math-equal comparison, so let's do that. Here's one way -
$: weekday=$(date -d 2023-10-15 +%u)
$: if (( 7 == weekday )); then echo got seven; else echo nope; fi
got seven
Some things to note here: double parens create an arithmetic testing scope, while single parens create a subshell. A subshell might be what's intended sometimes, but isn't what you wanted here.
$: if (7==weekday); then echo got seven; else echo nope; fi
bash: 7==weekday: command not found
nope
$: if (echo foo); then echo success; else echo nope; fi
foo
success
$: if echo foo; then echo success; else echo nope; fi
foo
success
The if
accepts the single parens, executes the command in a subshell, and tests the exit code from the subshell. You usually don't need the subshell parens at all; you'd only want them in certain situations, which are out of scope here.
Using DOUBLE parens is a different thing entirely, which does an arithmetic evaluation on what's inside. (( 7 == weekday ))
reads the seven, takes the ==
to means it is to check for arithmetic equality, reads the value in weekday
, compares as directed and returns an exit code of zero if the test was true, and one (any nonzero is considered failure) if the test was false.
Note also that I was able to leave the $
dollar sign off the weekday
variable in that arithmetic context, though it works fine (if a subtle bit differently) if you leave it on.
So one version of your test might be:
weekday=$(date -d $thisfiledate +%u)
if (( weekday == 7 )); then mv "$old" "$new"; fi
There are several other ways you could have structured this test.
You could have done it as a string comparison, as long as you are clear that is what you are doing.
$: if [[ 7 == $weekday ]]; then echo got seven; else echo nope; fi
got seven
Note that in this construct, spaces are required in the square brackets. This is true whether using bash
double brackets or test
single brackets. Both [[
and [
allow either double ==
or single =
as equivalent; single-equal is more portable, but can lead to hard-to-debug accidental assignments... you can make those less likely by putting any literals on the left hand side.
$: [[ 7 == $weekday ]] && echo got seven
got seven
$: [[ 7=$weekday ]] && echo got seven
got seven
$: [ 7==$weekday ] && echo got seven
got seven
$: [ 7=$weekday ] && echo got seven
got seven
$: [[7==$weekday ]] && echo got seven
bash: [[7==7: command not found
$: [[ 7==$weekday]] && echo got seven
bash: conditional binary operator expected
$: [7==$weekday ] && echo got seven
bash: [7==7: command not found
$: [ 7=$weekday] && echo got seven
bash: [: missing `]'
ALWAYS watch for accidental assignment errors any time you see single-equals used in a test, though string comparisons with [[
or [
are generally safe.
$: foo=1; bar=foo; [[ $bar=2 ]] && echo true || echo nope # oh, no...
true
$: echo "bar='$bar' foo='$foo'" # corrupted data
bar='foo' foo='1'
Be careful with the arithmetic syntax tho...
$: (( 7==weekday )) && echo true
true
$: (( 7=weekday )) && echo true
bash: ((: 7=weekday : attempted assignment to non-variable (error token is "=weekday ")
$: (( weekday=5 )) && echo true # oops! overwritten value, ALWAYS true
true
$: (( $weekday=5 )) && echo true # don't trust $ to prevents this -
bash: ((: 5=5 : attempted assignment to non-variable (error token is "=5 ")
$: foo=1; bar=foo; (( $bar=2 )) && echo true; # OOPS!!
true
$: echo "bar='$bar' foo='$foo'" # wrong answer AND corrupted data
bar='foo' foo='2'
By the time you realize it is broken, it will have been happily using wrong data in the wrong logic block for weeks...
You can use $(cmd)
in place in your test -
if [[ 7 == $(date -d $thisfiledate +%u) ]]; then ... # compared as string
or
if (( 7 == $(date -d $thisfiledate +%u) )); then ... # compared as integers
elif
elseif
is not valid. See above - use elif
.
$: if false; then echo false; elif true; then echo true; else echo nope; fi
true
$: if false; then echo false; elseif true; then echo true; else echo nope; fi
bash: syntax error near unexpected token `then'
Note that while the error message is confusing, the only change was the elif
to elseif
, and it broke. What's actually happening is that it doesn't recognize elseif
as a keyword, so treats it as a command token - it might be the name of a script you wrote, for example. That means the then
behind it (which is a keyword) shouldn't be there.
As one more option, you can use a case
statement.
case $(date -d $thisfiledate +%u) in 7) mv "$old" "$new";; esac
Look at that mv
statement. Your original code:
then mv $filename "Backup_KW" date -d thisfiledate +%U ".bak"
c.f. the mv
man page
Aside from the dash-parameters, mv
either expects exactly two files as arguments, or the last argument must be a directory into which the other arguments will be copied.
That means if it had gotten far enough to execute, your command above would have immediately failed because there is no -d
argument to mv
; aside from that, what you told it was to look for a directory named .bak
into which it should copy files named Backup_KW
, date
, thisfiledate
, and +%U
, none of which would it find.
What you wanted:
then mv "$filename" Backup_KW$(date -d $thisfiledate +%U).bak
This executes $(date -d $thisfiledate +%U)
and replaces it as the line is being parsed with the output, which is done before passing the names to mv
. What mv
sees when that's done is, for example,
mv Backup_2023-10-15.bak BackupKW42.bak
The initial if
test will be false
when the date
fails on all the existing weekly backup files and outputs nothing. Those aren't a problem here, but the error message is ugly and unnecessary with a little restructuring.
Those files will then fall through to the next test -
#remove files older than one week
elseif $thisfiledate<date +%F -d "-7 days" then rm #$filename
Since you have ordered the fields YEAR-MONTH-DAY and are apparently always using leading zeros in two-digit month and day, you can test these as string comparisons:
#remove files older than one week
elif [[ $thisfiledate < $(date +%F -d "-7 days") ]]; then rm "$filename"
But if a file ever misses a leading zero it's likely to break.
$: [[ 2023-9-9 > 2023-10-10 ]] && echo a is after b || echo a is before b # oops
a is after b
date
's %s
returns the UNIX epoch, a simple integer that can be cleanly compared.
Also, Daily and Weekly files are different filename patterns, so it's easy to factor them out and process them differently as you go.
#! /bin/bash -x
oneweekago=$( date -d '-7 days' +%s ) # not checking time of day
threeweeksago=$( date -d '-21 days' +%s ) # not checking day of week
oldestkeep=$( date -d '-3 weeks' +%U ) # for renamed weeklies
cd "${target_directory:=/tmp/testfiles/}" || { echo >&2 "unable to change to '${target_directory}'"; exit 1; }
for filename in Backup_*.bak; do # changed working directory simplifies filename handling
mark=${filename//[^0-9-]/} # remove all characters not digits or dashes (assumes ASCII or UTF-8)
case $mark in
# daily files have dashes
*-*) filedate=$( date -d $mark +%s )
if (( filedate < threeweeksago )); then rm "$filename"
elif (( 7 == $(date -d $mark +%u) )); then mv "$filename" Backup_KW$(date -d $mark +%U).bak
elif (( filedate < oneweekago )); then rm "$filename"
fi
;;
# weekly backups only have the week number, e.g., Backup_KW42.bak
*) (( mark < oldestkeep )) && rm "$filename"
;;
esac
done
I set -x
so there would be behavior-traceable output.
(You might want to add some logging...)
I created a ./testfiles/
directory and included all your example files, but those are all older than three weeks so I created (empty) daily files for all the days since September, and ran it.
For comparison reference:
++ date -d '-7 days' +%s
+ oneweekago=1699904288
++ date -d '-21 days' +%s
+ threeweeksago=1698694688
++ date -d '-3 weeks' +%U
+ oldestkeep=44
+ cd /tmp/testfiles/
Then as it ran through all the files up to the end of October:
+ for filename in Backup_*.bak
+ mark=2023-10-30
+ case $mark in
++ date -d 2023-10-30 +%s
+ filedate=1698642000
+ (( filedate < threeweeksago ))
+ rm Backup_2023-10-30.bak
It deleted them as being more than three weeks old.
When it got to Halloween:
+ for filename in Backup_*.bak
+ mark=2023-10-31
+ case $mark in
++ date -d 2023-10-31 +%s
+ filedate=1698728400
+ (( filedate < threeweeksago ))
++ date -d 2023-10-31 +%u
+ (( 7 == 2 ))
+ (( filedate < oneweekago ))
+ rm Backup_2023-10-31.bak
It was not more than three weeks old, but it also wasn't a Sunday, and it WAS more than one week old, so it got deleted too, as did the next ones up to November 5th:
+ for filename in Backup_*.bak
+ mark=2023-11-05
+ case $mark in
++ date -d 2023-11-05 +%s
+ filedate=1699160400
+ (( filedate < threeweeksago ))
++ date -d 2023-11-05 +%u
+ (( 7 == 7 ))
++ date -d 2023-11-05 +%U
+ mv Backup_2023-11-05.bak Backup_KW45.bak
Which was a Sunday, so it was copied as the week 45 backup.
Subsequent days were again deleted until Sunday the 12th, kept as week 46.
As of the 14th it stopped deleting files.
+ for filename in Backup_*.bak
+ mark=2023-11-14
+ case $mark in
++ date -d 2023-11-14 +%s
+ filedate=1699941600
+ (( filedate < threeweeksago ))
++ date -d 2023-11-14 +%u
+ (( 7 == 2 ))
+ (( filedate < oneweekago ))
It silently left them alone up to Sunday the 19th which it moved/saved as week 47.
When it hit Backup_KW1.bak
it removed it as out of date.
+ case $mark in
+ (( mark < oldestkeep ))
+ rm Backup_KW1.bak
If I run it again, there's nothing left to delete or rename, so it just audits and ignores them all, leaving these:
$: ls -l testfiles
total 0
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-14.bak
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-15.bak
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-16.bak
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-17.bak
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-18.bak
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-20.bak
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_KW45.bak
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_KW46.bak
-rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_KW47.bak