pythonamazon-s3rasteriobytestream

Load raster from bytestream and set its CRS


What I want to do : load a raster from an s3 bucket in memory and set its CRS to 4326 (it has no crs set)

What I have so far:

import boto3
import rasterio
from rasterio.crs import CRS

bucket = 'my bucket'
key = 'my_key'
s3 = boto3.client('s3')
file_byte_string = s3.get_object(Bucket=bucket,Key=key)['Body'].read()
with rasterio.open(BytesIO(file_byte_string), mode='r+') as ds:
  crs = CRS({"init": "epsg:4326"}) 
  ds.crs = crs

I have found the way to structure my code here

Set CRS for a file read with rasterio

It works if I give it a path to a local file but it does not work for bytestreams.

The error I get when I have '+r' mode:

rasterio.errors.PathError: invalid path '<_io.BytesIO object at 0x7fb4503ca4d0>'

The error I get when I have 'r' mode:

rasterio.errors.DatasetAttributeError: read-only attribute

Is there a way to load bytestream in r+ mode so that I can set/modify the CRS?


Solution

  • You can achieve this if you wrap your bytes in a NamedTemporaryFile. This and some alternatives are explained in the docs.

    import boto3
    import rasterio
    from rasterio.crs import CRS
    import tempfile
    
    bucket = 'asdf'
    key = 'asdf'
    
    
    s3 = boto3.client('s3')
    file_byte_string = s3.get_object(Bucket=bucket,Key=key)['Body'].read()
    
    with tempfile.NamedTemporaryFile() as tmpfile:
        tmpfile.write(file_byte_string)
        with rasterio.open(tmpfile.name, "r+") as ds:
             crs = CRS({"init": "epsg:4326"}) 
             ds.crs = crs
    

    An important limitation of this approach is that you have to download the whole file into memory from S3, as opposed to mounting the file remotely like this.