pythonazure-blob-storageblobazcopyblobstorage

azcopy is failing to copy the blob whose size is less than 1000kb


Log from azcopy:

 2024/10/21 06:47:13 ==> REQUEST/RESPONSE (Try=1/69.6505ms, OpTime=69.771031ms) -- RESPONSE SUCCESSFULLY RECEIVED PUT https://storage-account.blob.core.windows.net/container/2024/filepath/bb-GOPACS-0.jsonl?si=rlwc&sig=-REDACTED-&sr=c&sv=2024-08-04 
Accept: application/xml Content-Length: 0 User-Agent: AzCopy/10.26.0 azsdk-go-azblob/v1.4.0 (go1.22.5; linux) X-Ms-Client-Request-Id: 
 x-ms-access-tier: Hot x-ms-blob-content-md5: x-ms-blob-content-type: application/jsonl;charset=UTF-8 
x-ms-blob-type: BlockBlob x-ms-copy-source: https://storage-account.blob.core.windows.net/container/2024/filepath/bb-GOPACS-0.jsonl?sas-token
 2023-08-03 Status: 409 The blob type is invalid for this operation.Content-Length: 228Content-Type: application/xmlDate: Mon, 21 Oct 2024 06:47:13 GMTServer: Windows-Azure-Blob/1.0
 Microsoft-HTTPAPI/2.0X-Ms-Client-Request-Id: -Ms-Error-Code: InvalidBlobTypeX-Ms-Request-Id: -Ms-Version: 2023-08-03
2024/10/21 06:47:13 ERR: [P#0-T#0] COPYFAILED: https://storage-account.blob.core.windows.net/container/2024/filepath/bb-GOPACS-0.jsonl?si=rlwc&sig=-REDACTED-&sr=c&sv=2024-08-04/bb-GOPACS-0.jsonl : 409 : 409 The blob type is invalid for this operation.. When Put Blob from URL. X-Ms-Request-Id: d7601b94-801e-0030-4b85-23a96b000000

My requirement is to convert append blob to block blob. consider I have an append blob called : publicstatistics_16.jsonl

Step 1) first I use AZcopy to copy the append blob and in the destination I will create it as bb-publicstatistics_16.jsonl

Step 2) Then I again use azcopy to copy the block blob which is bb-publicstatistics_16.jsonl to publicstatistics_16.jsonl.

This works perfectly using azcopy when the blob size is greater than 1000kb. any blob whose size is less than 1000 kb it showed with the above error which I have sent.

At what stage it fails ? It fails at step2. All I am trying to do is copy the block blob whose name is bb-publicstatistics_16.jsonl to publicstatistics_16.jsonl in the same container.

I even tried to manually download the blob and upload it with the different name for all the blobs whose size is less than 1000kb same error "blob type is invalid" and I used the code as well still the same error.

def upload_small_block_blob(self, blob_service_client, container, blob_name):# Download the small blob using the SDK
   new_blob_client = container_client.get_blob_client(new_block_blob_name)
   downloaded_data = new_blob_client.download_blob().readall()
   blob_client = blob_service_client.get_blob_client(container=container, blob=blob_name)

        try:
            # Check if the blob already exists
            if blob_client.exists():
                log.error(f"Blob '{blob_name}' already exists. Choose a different name.")
                return
            # Upload the data as a new Block Blob
            blob_client.upload_blob(data)
            log.info(f"Uploaded new Block Blob '{blob_name}' successfully.")
        except Exception as e:
            log.error(f"Failed to upload new Block Blob '{blob_name}': {str(e)}")

even in this case also, it shows the error blob type is invalid. no clue what has happened.

Note: This issue happens only if the blob size is less than 1000kb and there is no issue if the blob size is greater than 1000kb to 100Gib(this is the max blob conversion I have done till now for one file).


Solution

  • At what stage it fails ?** It fails at step2. **All I am trying to do is copy the block blob whose name is bb-publicstatistics_16.jsonl to publicstatistics_16.jsonl in the same container.

    In my environment I stored a file less the 1000kb file with same name publicstatistics_16.jsonl(Append blob) in the Azure Blob storage.

    Portal: enter image description here

    You can use the below code which downloads the append blob and upload the blob as block blob type using Python.

    Code:

    from azure.storage.blob import BlobServiceClient, BlobClient
    
    def upload_small_block_blob(connection_string, container_name, blob_name, new_blob_name):
        try:
    
            blob_service_client = BlobServiceClient.from_connection_string(connection_string)
            container_client = blob_service_client.get_container_client(container_name)
            old_blob_client = container_client.get_blob_client(blob_name)
        
            downloaded_data = old_blob_client.download_blob().readall()
            new_blob_client = container_client.get_blob_client(new_blob_name)
            if new_blob_client.exists():
                print(f"Blob '{new_blob_name}' already exists. Choose a different name.")
                return
            new_blob_client.upload_blob(downloaded_data, blob_type="BlockBlob")
            print(f"Uploaded new Block Blob '{new_blob_name}' successfully.")
        except Exception as e:
            print(f"Failed to upload new Block Blob '{new_blob_name}': {str(e)}")
    
    # Example usage
    connection_string = "xxxx"
    container_name = "xxx"
    blob_name = "publicstatistics_16.jsonl"  # Original append blob
    new_blob_name = "bb-publicstatistics_16.jsonl"  # New block blob
    
    upload_small_block_blob(connection_string, container_name, blob_name, new_blob_name)
    

    Output:

    Uploaded new Block Blob 'bb-publicstatistics_16.jsonl' successfully.
    

    Portal: enter image description here