pythonazure-blob-storage

Solve timeout errors on file uploads with new azure.storage.blob package


I had to upgrade a docker container that was using the older version of microsoft azure's python packages to download data from an api, then upload a json to Azure Blob Storage. So since the pip install of the former "azure" metapackage is no longer allowed I have to use the new standalone packages (azure-storage-blob==12.6.0).

Switching from the function "create_blob_from_path" from the blockblobservice integrated in the old "azure" package, to the new standalone package and BlobClient.upload() fails on larger files with a timeout error that completely ignores the timeout parameter of the function.

I get a ServiceResponseError with the msg "Connection aborted / The write operation timed out"

Is there any way to solve that error ?

The new function feels like a huge step backwards from create_blob_from_path, the absence of progress_callback mainly is deplorable...


Solution

  • The correct solution, if your control flow allows it, seems to be setting the max_single_put_size to something smaller (like 4MB) when you create the BlobClient. You can do this with a keyword parameter when calling the constructor.

    However, as near as I can tell, this parameter cannot be configured if creating a BlobClient through the BlobClient.from_blob_url control flow. The default value for this is 64MB, and it is easy to hit the default connection timeout before a 64MB PUT is done. In some applications, you may not have access to auth credentials for the storage account (i.e. you're just using a signed URL), so the only way to create a BlobClient is from a BlobClient.from_blob_url call.

    It seems like the workaround is to set the poorly documented connection_timeout parameter on the upload_blob call, instead of the timeout parameter. So, something like:

    upload_result = block_blob_client.upload_blob(
        data,
        blob_type="BlockBlob",
        content_settings=content_settings,
        length=file_size,
        connection_timeout=600,
    )
    

    That parameter is documented on this page:

    https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/azure-storage-blob#other-client--per-operation-configuration

    However, it is not currently documented on the official BlobClient documentation:

    https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python

    I've filed this documentation bug: https://github.com/Azure/azure-sdk-for-python/issues/22936