bashdateif-statementbackupcut

Make backup (removal) logic in bash


I'm trying to apply a complex logic for removing old backups. Ideally, I'd like to only have backups up to 3 weeks (up to 3 weeklies, 7 dailies +-).

I'd like to rely on the filename for dating the file instead of actual creation date if possible.

This is how the example files look like:

Backup_2023-07-01.bak  Backup_2023-07-13.bak  Backup_2023-07-25.bak  Backup_2023-08-06.bak  Backup_2023-08-18.bak  Backup_2023-08-30.bak  Backup_2023-09-11.bak  Backup_2023-09-23.bak
Backup_2023-07-02.bak  Backup_2023-07-14.bak  Backup_2023-07-26.bak  Backup_2023-08-07.bak  Backup_2023-08-19.bak  Backup_2023-08-31.bak  Backup_2023-09-12.bak  Backup_2023-09-24.bak
Backup_2023-07-03.bak  Backup_2023-07-15.bak  Backup_2023-07-27.bak  Backup_2023-08-08.bak  Backup_2023-08-20.bak  Backup_2023-09-01.bak  Backup_2023-09-13.bak  Backup_2023-09-25.bak
Backup_2023-07-04.bak  Backup_2023-07-16.bak  Backup_2023-07-28.bak  Backup_2023-08-09.bak  Backup_2023-08-21.bak  Backup_2023-09-02.bak  Backup_2023-09-14.bak  Backup_2023-09-26.bak
Backup_2023-07-05.bak  Backup_2023-07-17.bak  Backup_2023-07-29.bak  Backup_2023-08-10.bak  Backup_2023-08-22.bak  Backup_2023-09-03.bak  Backup_2023-09-15.bak  Backup_2023-09-27.bak
Backup_2023-07-06.bak  Backup_2023-07-18.bak  Backup_2023-07-30.bak  Backup_2023-08-11.bak  Backup_2023-08-23.bak  Backup_2023-09-04.bak  Backup_2023-09-16.bak  Backup_2023-09-28.bak
Backup_2023-07-07.bak  Backup_2023-07-19.bak  Backup_2023-07-31.bak  Backup_2023-08-12.bak  Backup_2023-08-24.bak  Backup_2023-09-05.bak  Backup_2023-09-17.bak  Backup_2023-09-29.bak
Backup_2023-07-08.bak  Backup_2023-07-20.bak  Backup_2023-08-01.bak  Backup_2023-08-13.bak  Backup_2023-08-25.bak  Backup_2023-09-06.bak  Backup_2023-09-18.bak  Backup_2023-09-30.bak
Backup_2023-07-09.bak  Backup_2023-07-21.bak  Backup_2023-08-02.bak  Backup_2023-08-14.bak  Backup_2023-08-26.bak  Backup_2023-09-07.bak  Backup_2023-09-19.bak  Backup_2023-09-31.bak
Backup_2023-07-10.bak  Backup_2023-07-22.bak  Backup_2023-08-03.bak  Backup_2023-08-15.bak  Backup_2023-08-27.bak  Backup_2023-09-08.bak  Backup_2023-09-20.bak  Backup_KW1.bak
Backup_2023-07-11.bak  Backup_2023-07-23.bak  Backup_2023-08-04.bak  Backup_2023-08-16.bak  Backup_2023-08-28.bak  Backup_2023-09-09.bak  Backup_2023-09-21.bak
Backup_2023-07-12.bak  Backup_2023-07-24.bak  Backup_2023-08-05.bak  Backup_2023-08-17.bak  Backup_2023-08-29.bak  Backup_2023-09-10.bak  Backup_2023-09-22.bak

This is my script so far. The errors I get are not helping me understand, sadly.

#! /bin/bash
for filename in ./testfiles/*.bak; do
  thisfiledate=$("$filename" | cut -d "_" -f2 | cut -d "." -f1)
  echo $thisfiledate
 #rename files made on sundays to weekly backups, dont delete them yet
if (date -d $thisfiledate +%u)=7;
  then mv $filename "Backup_KW" date -d thisfiledate +%U ".bak"
 #remove files older than one week
  elseif $thisfiledate<date +%F -d "-7 days" then rm #$filename
 #remove weekly files older than 3 weeks
  elseif $($filename | cud -d "KW" -f2 | cut -d "." -f1)<date -d "-3 weeks" +%U then rm #$filename
fi
done

The problem here is multi-faceted. I'm new to cutand datebut otherwise have used bash in my backup scripts successfully. It's hard to find answers since there are a gazillion ways to do this.

I'm trying to reduce the amount of cans of worms to open just yet (such as regex, which I can't wrap my head around).

Please help! Greatly appreciated.


Solution

  • Running it in ShellCheck.net usually helps, but you need to know some syntax to read the suggestions...

    Line 6:
    if (date -d $thisfiledate +%u)=7;
    ^-- SC1073 (error): Couldn't parse this if expression. Fix to allow more checks.
                                  ^-- SC1050 (error): Expected 'then'.
                                  ^-- SC1072 (error): Expected 'then'. Fix any mentioned problems and try again.
                                  ^-- SC1141 (error): Unexpected tokens after compound command. Bad redirection or missing ;/&&/||/|?
    
    

    This isn't clear, and the man page doesn't help much.

    Once you fix these, more errors will show; as pointed out above, I suspect you meant cut instead of cud. cut will complain about -d "KW" - it can't use multi-character delimiters. We can do better, though; we'll refactor while we're fixing the bugs.

    if Syntax

    The basic structure of an if statement in bash is:

    if LIST;
    then LIST; 
    [elif LISTA1; then LISTA2; [elif LISTN1; then LISTN2;] ] 
    [else LISTX;] 
    fi
    

    LIST is a valid "command pipeline". elif and else are optional, and elif can be repeated.

    Conditionals check return values, and zero means success, while any nonzero is failure. ALWAYS keep that in mind.

    Your command -

    if (date -d $thisfiledate +%u)=7; then ...
    

    The ( opens a subshell, running the commands inside in a forked environment. I'm pretty sure this is not what you meant.
    The = makes no sense in this context. Its left-hand side is not an assignable token, it isn't inside a comparison construct,and it isn't a redirection like < or > or a pipe (|), or a conditional operator like && or ||, or whitespace, or a semicolon... it's none of the things that it can figure out, so the parser bails and tells you that you did it wrong.

    All of this is because what you meant to do, a conditional comparison, requires a different syntax. There are a few.

    Exit Codes

    When you run any command LIST it returns a status to the shell, and that's what is checked in the if.

    $: date -d 2023-10-10 +%u
    2
    
    $: echo $?
    0
    
    $: if date -d 2023-10-10 +%u # The `2` output IS NOT TESTED HERE.
    then echo ok; else echo no; fi
    2
    ok
    

    If you want a test on the output you have to construct your test correctly. There are several ways to accomplish this; let's capture that data separately to make the point.

    $: weekday=$(date -d 2023-10-10 +%u)
    
    $: echo $weekday
    2
    

    The success of the date command isn't explicitly checked here, but if it fails there will be nothing in weekday, so that's often considered good enough. Be prepared.

    $: weekday=$(date -d 'foo bar baz' +%u)
    date: invalid date ‘foo bar baz’
    
    $: echo $weekday
    
    

    You used this $(cmd) construct in your code, though yours is broken:

    $: thisfiledate=$("$filename" | cut -d "_" -f2 | cut -d "." -f1)
    bash: Backup_KW1.bak: command not found
    

    (You meant to work with the value in the variable as a string, but executed it as a command. You needed something like echo "$filename" - we'll come back to that.)

    special structures for testing conditions with if CMD

    Testing the output is a separate step. You were trying to use a math-equal comparison, so let's do that. Here's one way -

    $: weekday=$(date -d 2023-10-15 +%u)
    
    $: if (( 7 == weekday )); then echo got seven; else echo nope; fi
    got seven
    

    Some things to note here: double parens create an arithmetic testing scope, while single parens create a subshell. A subshell might be what's intended sometimes, but isn't what you wanted here.

    $: if (7==weekday); then echo got seven; else echo nope; fi
    bash: 7==weekday: command not found
    nope
    
    $: if (echo foo); then echo success; else echo nope; fi
    foo
    success
    
    $: if echo foo; then echo success; else echo nope; fi
    foo
    success
    

    The if accepts the single parens, executes the command in a subshell, and tests the exit code from the subshell. You usually don't need the subshell parens at all; you'd only want them in certain situations, which are out of scope here.

    Using DOUBLE parens is a different thing entirely, which does an arithmetic evaluation on what's inside. (( 7 == weekday )) reads the seven, takes the == to means it is to check for arithmetic equality, reads the value in weekday, compares as directed and returns an exit code of zero if the test was true, and one (any nonzero is considered failure) if the test was false.

    Note also that I was able to leave the $ dollar sign off the weekday variable in that arithmetic context, though it works fine (if a subtle bit differently) if you leave it on.

    So one version of your test might be:

    weekday=$(date -d $thisfiledate +%u)
    if (( weekday == 7 )); then mv "$old" "$new"; fi
    

    There are several other ways you could have structured this test.

    You could have done it as a string comparison, as long as you are clear that is what you are doing.

    $: if [[ 7 == $weekday ]]; then echo got seven; else echo nope; fi
    got seven
    

    Caveats

    Note that in this construct, spaces are required in the square brackets. This is true whether using bash double brackets or test single brackets. Both [[ and [ allow either double == or single = as equivalent; single-equal is more portable, but can lead to hard-to-debug accidental assignments... you can make those less likely by putting any literals on the left hand side.

    $: [[ 7 == $weekday ]] && echo got seven
    got seven
    
    $: [[ 7=$weekday ]] && echo got seven
    got seven
    
    $: [ 7==$weekday ] && echo got seven
    got seven
    
    $: [ 7=$weekday ] && echo got seven
    got seven
    
    $: [[7==$weekday ]] && echo got seven
    bash: [[7==7: command not found
    
    $: [[ 7==$weekday]] && echo got seven
    bash: conditional binary operator expected
    
    $: [7==$weekday ] && echo got seven
    bash: [7==7: command not found
    
    $: [ 7=$weekday] && echo got seven
    bash: [: missing `]'
    

    ALWAYS watch for accidental assignment errors any time you see single-equals used in a test, though string comparisons with [[ or [ are generally safe.

    $: foo=1; bar=foo; [[ $bar=2 ]] && echo true || echo nope # oh, no...
    true
    
    $: echo "bar='$bar' foo='$foo'" # corrupted data
    bar='foo' foo='1'
    

    Be careful with the arithmetic syntax tho...

    $: (( 7==weekday )) && echo true
    true
    
    $: (( 7=weekday )) && echo true
    bash: ((: 7=weekday : attempted assignment to non-variable (error token is "=weekday ")
    
    $: (( weekday=5 )) && echo true # oops! overwritten value, ALWAYS true
    true
    
    $: (( $weekday=5 )) && echo true # don't trust $ to prevents this -
    bash: ((: 5=5 : attempted assignment to non-variable (error token is "=5 ")
    
    $: foo=1; bar=foo; (( $bar=2 )) && echo true; # OOPS!!
    true
    
    $: echo "bar='$bar' foo='$foo'" # wrong answer AND corrupted data
    bar='foo' foo='2'
    

    By the time you realize it is broken, it will have been happily using wrong data in the wrong logic block for weeks...

    Brevity

    You can use $(cmd) in place in your test -

    if [[ 7 == $(date -d $thisfiledate +%u) ]]; then ... # compared as string
    

    or

    if (( 7 == $(date -d $thisfiledate +%u) )); then ... # compared as integers
    

    elif

    elseif is not valid. See above - use elif.

    $: if false; then echo false; elif true; then echo true; else echo nope; fi
    true
    
    $: if false; then echo false; elseif true; then echo true; else echo nope; fi
    bash: syntax error near unexpected token `then'
    

    Note that while the error message is confusing, the only change was the elif to elseif, and it broke. What's actually happening is that it doesn't recognize elseif as a keyword, so treats it as a command token - it might be the name of a script you wrote, for example. That means the then behind it (which is a keyword) shouldn't be there.

    As one more option, you can use a case statement.

    case $(date -d $thisfiledate +%u) in 7) mv "$old" "$new";; esac
    

    Other Errors

    CMD is parsed before execution

    Look at that mv statement. Your original code:

      then mv $filename "Backup_KW" date -d thisfiledate +%U ".bak"
    

    c.f. the mv man page Aside from the dash-parameters, mv either expects exactly two files as arguments, or the last argument must be a directory into which the other arguments will be copied.

    That means if it had gotten far enough to execute, your command above would have immediately failed because there is no -d argument to mv; aside from that, what you told it was to look for a directory named .bak into which it should copy files named Backup_KW, date, thisfiledate, and +%U, none of which would it find.

    What you wanted:

      then mv "$filename" Backup_KW$(date -d $thisfiledate +%U).bak
    

    This executes $(date -d $thisfiledate +%U) and replaces it as the line is being parsed with the output, which is done before passing the names to mv. What mv sees when that's done is, for example,

     mv Backup_2023-10-15.bak BackupKW42.bak
    

    Logic and testing

    The initial if test will be false when the date fails on all the existing weekly backup files and outputs nothing. Those aren't a problem here, but the error message is ugly and unnecessary with a little restructuring.

    Those files will then fall through to the next test -

     #remove files older than one week
      elseif $thisfiledate<date +%F -d "-7 days" then rm #$filename
    

    Since you have ordered the fields YEAR-MONTH-DAY and are apparently always using leading zeros in two-digit month and day, you can test these as string comparisons:

     #remove files older than one week
      elif [[ $thisfiledate < $(date +%F -d "-7 days") ]]; then rm "$filename"
    

    But if a file ever misses a leading zero it's likely to break.

    $: [[ 2023-9-9 > 2023-10-10 ]] && echo a is after b || echo a is before b # oops
    a is after b
    

    date's %s returns the UNIX epoch, a simple integer that can be cleanly compared.

    Also, Daily and Weekly files are different filename patterns, so it's easy to factor them out and process them differently as you go.

    My rewrite

    #! /bin/bash -x
    oneweekago=$(    date -d '-7 days'  +%s ) # not checking time of day
    threeweeksago=$( date -d '-21 days' +%s ) # not checking day of week
    oldestkeep=$(    date -d '-3 weeks' +%U ) # for renamed weeklies
    cd "${target_directory:=/tmp/testfiles/}" || { echo >&2 "unable to change to '${target_directory}'"; exit 1; }
    for filename in Backup_*.bak; do # changed working directory simplifies filename handling
      mark=${filename//[^0-9-]/}     # remove all characters not digits or dashes (assumes ASCII or UTF-8)
      case $mark in
      # daily files have dashes
      *-*) filedate=$( date -d $mark +%s )
           if (( filedate < threeweeksago ));    then rm "$filename"
           elif (( 7 == $(date -d $mark +%u) )); then mv "$filename" Backup_KW$(date -d $mark +%U).bak
           elif (( filedate < oneweekago ));     then rm "$filename"
           fi
           ;;
      # weekly backups only have the week number, e.g., Backup_KW42.bak
        *) (( mark < oldestkeep )) && rm "$filename"
           ;;
      esac
    done
    

    I set -x so there would be behavior-traceable output.
    (You might want to add some logging...)

    I created a ./testfiles/ directory and included all your example files, but those are all older than three weeks so I created (empty) daily files for all the days since September, and ran it.

    For comparison reference:

    ++ date -d '-7 days' +%s
    + oneweekago=1699904288
    ++ date -d '-21 days' +%s
    + threeweeksago=1698694688
    ++ date -d '-3 weeks' +%U
    + oldestkeep=44
    + cd /tmp/testfiles/
    

    Then as it ran through all the files up to the end of October:

    + for filename in Backup_*.bak
    + mark=2023-10-30
    + case $mark in
    ++ date -d 2023-10-30 +%s
    + filedate=1698642000
    + ((  filedate < threeweeksago  ))
    + rm Backup_2023-10-30.bak
    

    It deleted them as being more than three weeks old.

    When it got to Halloween:

    + for filename in Backup_*.bak
    + mark=2023-10-31
    + case $mark in
    ++ date -d 2023-10-31 +%s
    + filedate=1698728400
    + ((  filedate < threeweeksago  ))
    ++ date -d 2023-10-31 +%u
    + ((  7 == 2  ))
    + ((  filedate < oneweekago  ))
    + rm Backup_2023-10-31.bak
    

    It was not more than three weeks old, but it also wasn't a Sunday, and it WAS more than one week old, so it got deleted too, as did the next ones up to November 5th:

    + for filename in Backup_*.bak
    + mark=2023-11-05
    + case $mark in
    ++ date -d 2023-11-05 +%s
    + filedate=1699160400
    + ((  filedate < threeweeksago  ))
    ++ date -d 2023-11-05 +%u
    + ((  7 == 7  ))
    ++ date -d 2023-11-05 +%U
    + mv Backup_2023-11-05.bak Backup_KW45.bak
    

    Which was a Sunday, so it was copied as the week 45 backup.
    Subsequent days were again deleted until Sunday the 12th, kept as week 46.

    As of the 14th it stopped deleting files.

    + for filename in Backup_*.bak
    + mark=2023-11-14
    + case $mark in
    ++ date -d 2023-11-14 +%s
    + filedate=1699941600
    + ((  filedate < threeweeksago  ))
    ++ date -d 2023-11-14 +%u
    + ((  7 == 2  ))
    + ((  filedate < oneweekago  ))
    

    It silently left them alone up to Sunday the 19th which it moved/saved as week 47.

    When it hit Backup_KW1.bak it removed it as out of date.

    + case $mark in
    + ((  mark < oldestkeep  ))
    + rm Backup_KW1.bak
    

    If I run it again, there's nothing left to delete or rename, so it just audits and ignores them all, leaving these:

    $: ls -l testfiles
    total 0
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-14.bak
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-15.bak
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-16.bak
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-17.bak
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-18.bak
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_2023-11-20.bak
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_KW45.bak
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_KW46.bak
    -rw-r--r-- 1 paul 1049089 0 Nov 20 14:16 Backup_KW47.bak