linuxbashwget

Download zip files via wget


I like to play chess and would like to download the games of the Grandmasters starting from Mon 25th Jun 2012 until today and continuously every week, on Monday from the internet as zip file. The zip files are freely available. The zip files have names ordered by a number e.g. twic920g.zip - twic1493g.zip. The next week the number increases by 1 to twic1494g.zip. For the first run this script works.

Here are my questions:

  1. how do I increase the counter by plus 1 every week?
  2. when unpacking, the locally saved zip data is also unpacked again and not only the alktuell downloaded file. With the cat command the old and new files are merged. So the master.pgn has the games twice.
#!/bin/bash

dir="pgn/zip"

if [[ ! -d $dir ]]; then
    mkdir -p $dir
fi

cd $dir

# Download all PGN files
for i in {920..1493}; do
    wget -nc  https://www.theweekinchess.com/zips/twic"$i"g.zip
    unzip twic"$i"g.zip
    cat twic"$i".pgn >> ../master.pgn
    rm twic"$i".pgn
done

Solution

  • how do I increase the counter by plus 1 every week?

    I think once you've downloaded the historic games you don't need to worry about incrementing a counter: you can get the link for the "current" game by parsing content from https://theweekinchess.com/zips/.

    A more robust solution would probably require something other than a shell script, but this works:

    curl https://theweekinchess.com/zips/ | grep 'twic[0-9]*g.zip' | cut -f2 -d'"'
    

    For example, running that right now produces:

    http://www.theweekinchess.com/zips/twic973g.zip
    

    Just run a script to download the latest archive once a week (e.g., using cron).


    Alternately, you could write the number of the last file downloaded successfully to a file, and use that as the starting value next time it runs:

    #!/bin/bash
    
    dir="pgn/zip"
    
    if [[ ! -d $dir ]]; then
    mkdir -p $dir
    fi
    
    cd $dir
    
    # figure out number of last successfully fetched game
    last_fetched=$(cat last_fetched 2> /dev/null || echo 0)
    
    if (( last_fetched == 0 )); then
        first=920
    else
        first=$(( last_fetched + 1 ))
    fi
    
    echo "starting with: $first"
    
    # Download all PGN files
    for (( i=first; 1; i++ )); do
        # don't download a file if it already exists
        [[ -f "twic${i}g.zip" ]] && continue
    
        echo "fetching game $i"
        curl -sSfLO  "https://www.theweekinchess.com/zips/twic${i}g.zip" || break
        echo "$i" > last_fetched
        unzip -p twic"$i"g.zip >> ../master.pgn
    done
    

    when unpacking, the locally saved zip data is also unpacked again and not only the ... downloaded file. With the cat command the old and new files are merged. So the master.pgn has the games twice.

    I'm not sure what you're saying here. You're only unpacking the file you've just downloaded, so any existing zip files shouldn't matter.

    Instead of appending to master.pgn in every loop iteration, you could leave the unpacked files on disk and completely regenerate master.pgn at the end of the script:

    for (( i=first; 1; i++ )); do
        # don't download a file if it already exists
        [[ -f "twic${i}g.zip" ]] && continue
    
        echo "fetching game $i"
        curl -sSfLO  "https://www.theweekinchess.com/zips/twic${i}g.zip" || break
        echo "$i" > last_fetched
        unzip twic"$i"g.zip
    done
    
    cat *.pgn > ../master.pgn