amazon-web-servicesamazon-s3command-line-interfaceaws-cli

Is there a way to drop file extensions when using AWS CLI with --recursive?


I am trying to recursively upload parquet files to an AWS S3 bucket using AWS CLI. I want to drop the .parquet and use the file name as the target table name.

So in a directory of table1.parquet, table2.parquet I am to run something like this:

aws s3 cp ./MyDir s3://mybucket/ --recursive

Where I get the below error, which makes sense because the expected table is table1 not table1.parque:

s3://mybucket/table1.parquet is not found

Ideally I would be able to specify in my CLI statement something like, where filename changes to table1, table2 etc:

aws s3 cp ./MyDir s3://mybucket/{filename} --recursive

Solution

  • The AWS CLI does not have a built-in feature to rename files during upload directly. However, you can achieve your goal by using a script. Here’s a simple script in Bash to upload Parquet files to S3 and rename them by dropping the .parquet extension:

    #!/bin/bash
    
    # Directory containing the Parquet files
    SOURCE_DIR="./MyDir"
    # Target S3 bucket
    S3_BUCKET="s3://mybucket/"
    
    # Loop through all .parquet files in the directory
    for filepath in "$SOURCE_DIR"/*.parquet; do
      # Extract the filename without the path
      filename=$(basename "$filepath")
      
      # Remove the .parquet extension
      target_name="${filename%.parquet}"
    
      # Upload the file to S3 with the new name
      aws s3 cp "$filepath" "$S3_BUCKET$target_name"
      
      if [ $? -eq 0 ]; then
        echo "Uploaded $filepath as $target_name"
      else
        echo "Failed to upload $filepath"
      fi
    done
    

    This will upload all .parquet files from the ./MyDir directory to your S3 bucket, using the filename (without .parquet) as the key.