I would like to copy only certain subfolder tree from S3 to my local disk. We have multiple JDBC source connectors for each table in the database and an S3 sink connector. I want to copy the file tree only for certain tables to my local disk from S3.
The structure is
Bucket
|_folder1
|_folder2
|_table1
|_DatePartition1
|_DatePartition2
|_table2
|_DatePartition1
|_DatePartition2
|_table3
|_DatePartition1
|_DatePartition2
|_table4
|_DatePartition1
|_DatePartition2
In the above situation, I want to copy entire structure under table3 and table4 only.
I tried different combinations of include and exclude, but that did not work.
aws s3 cp s3://bucketname/folder1/folder2 . --exclude "*/*" --include "table3*" --recursive
OR
aws s3 cp s3://bucketname/folder1/folder2 . --exclude "*" --include "table3*, tabl4*" --recursive
And a few others, but none of them worked. They either gave me errors or copied everything, not just the specific folder tree.
How can I set-up my exclude
and include
so that I can copy only the specific folder structures to my local disk ?
While the documentation only mentions operating from a local system to S3, a bit of reading between the lines suggests the filters operate on the partial key after any specified common prefix.
In other words, you'll need to specify the additional prefix like you were attempting to in your second example, however, the filters do not support complex operations, so you'll need to stack them:
aws s3 cp s3://example/folder1/folder2/ . \
--exclude="*" \
--include="table1/*" \
--include="table2/*" \
--recursive
Note the use of \
in this example to make it a bit easier to read and follow. If running this on windows, you'll need to remove the \
at the end of each line and make it a command on one line.