Hey I am trying to copy multiple objects(200,000+) from a public weather forecast bucket (NOAA GFS) to my private S3 bucket.
Each forecast file has its own index file that contains byte-ranges for different variables. From these 500 variables, only 20 of them are useful to me.
I've managed to do this with boto3 by iterating every object and extracting specified bytes. But this process was awfully slow. What could be a possible improvment?
You can implement it with upload_copy_part
. Check details at here
CopySourceRange
take a value like bytes=start-end