I am trying to recursively upload parquet files to an AWS S3 bucket using AWS CLI. I want to drop the .parquet and use the file name as the target table name.
So in a directory of table1.parquet, table2.parquet I am to run something like this:
aws s3 cp ./MyDir s3://mybucket/ --recursive
Where I get the below error, which makes sense because the expected table is table1 not table1.parque:
s3://mybucket/table1.parquet is not found
Ideally I would be able to specify in my CLI statement something like, where filename changes to table1, table2 etc:
aws s3 cp ./MyDir s3://mybucket/{filename} --recursive
The AWS CLI does not have a built-in feature to rename files during upload directly. However, you can achieve your goal by using a script. Here’s a simple script in Bash to upload Parquet files to S3 and rename them by dropping the .parquet extension:
#!/bin/bash
# Directory containing the Parquet files
SOURCE_DIR="./MyDir"
# Target S3 bucket
S3_BUCKET="s3://mybucket/"
# Loop through all .parquet files in the directory
for filepath in "$SOURCE_DIR"/*.parquet; do
# Extract the filename without the path
filename=$(basename "$filepath")
# Remove the .parquet extension
target_name="${filename%.parquet}"
# Upload the file to S3 with the new name
aws s3 cp "$filepath" "$S3_BUCKET$target_name"
if [ $? -eq 0 ]; then
echo "Uploaded $filepath as $target_name"
else
echo "Failed to upload $filepath"
fi
done
This will upload all .parquet files from the ./MyDir directory to your S3 bucket, using the filename (without .parquet) as the key.