.net-coreazure-blob-storage

Azure Storage Account: setting the Content-MD5 property for a blob when uploading via StageBlockAsync/CommitBlockListAsync


I'm failing at writing the MD5 hash into the Content-MD5 property blob property for a large file being uploaded via a BlockBlobClient calling StageBlockAsync for each block and CommitBlockListAsync to commit the full list of blocks.

I'm currently getting a "The MD5 value specified in the request is invalid. MD5 value must be 128 bits and base64 encoded" error. Clearly the MD5 hash I'm sending is base64 encoded as requested. I'm starting to thing that my MD5 hash is wrong and that is causing the failure and that the error message I'm getting is just misleading.

Following is my current code and the exception is occurring on the CommitBlockListAsync method

BlockBlobClient blobClient = GetBlobClient(containerName, virtualDirName, blobName);

string[] commitList = blockIds.ToArray();

BlobHttpHeaders headers = new BlobHttpHeaders
{
    ContentType = contentType,
    ContentHash = Encoding.ASCII.GetBytes(MetadataHelper.EncodeToBase64(md5))
};

Response<BlobContentInfo> info = await blobClient.CommitBlockListAsync(commitList, headers, metadata).ConfigureAwait(false);

Solution

  • In the end I discovered that indeed the error message is partially misleading (for example in the part where it says that the md5 string needs to be base64 encoded - it need not be - the base64 encoding is something that Azure does on the md5 string you provide). The only change that was required to my code to make it work is the following

    BlobHttpHeaders headers = new BlobHttpHeaders
    {
        ContentType = contentType,
        ContentHash = HexHelper.HexStringToByteArray(md5)  // must be a 16 bytes array
    };
    

    where HexStringToByteArray is as follows (it converts each pair of chars into a byte)

    public static byte[] HexStringToByteArray(string hex)
    {
        byte[] byteArray = new byte[hex.Length / 2];
    
        for (int i = 0; i < hex.Length; i += 2)
        {
            byteArray[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
        }
    
        return byteArray;
    }
    

    in other words ContentHash is expected to be a 16 bytes array (while Encoding.ASCII.GetBytes() would instead be returning a 32 bytes array if executed on the same md5 32 chars long string at it converts each char into a byte)