amazon-web-servicesamazon-s3command-line

How do I download files with AWS CLI based on a list?


I'm trying to download a subset of files from a public s3 bucket that contains millions of IRS files. I can download the entire repository with the command:

aws s3 sync s3://irs-form-990/ ./

But it takes way too long!

I know I should be using the --include / --exclude flags, but I don't know how to use them with a list of values. I have a csv that contains unique identifiers for all the files from 2017 that I'd like, but how do I use it in with AWS CLI? The list itself is half a million IDs long.

enter image description here

Help much appreciated. Thank you.


Solution

  • There is a bash script which can read all the filenames from a file filename.txt. All you have to do is to convert those IDs in filenames.

    #!/bin/bash  
    set -e  
    while read line  
    do  
       aws s3 cp s3://bucket-name/$line dest-path/  
    done <filename.txt
    

    This question was asked before and the answer you can find it here