I've integrated Azure SDK for CPP in my application, and there is significant slow down compared to old Azure SDK. After upgrade for Upload Azure-sdk-for-cpp parallelism, upload works better, but download is still VERY SLOW.
It can be reproduced with simple example, just by trying to download 1Gb file from Azure storage to local file system.
Old SDK was using CPP REST which used concurrency::streams::istream m_stream; There is no such thing in new SDK , except for TransferOptions.Concurrency which does almost nothing. Is there some idea how can DownloadTo can be speed up? Or should parallelism be implemented on top of the library?
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#include <azure/storage/blobs.hpp>
#include <cstdio>
#include <iostream>
#include <stdexcept>
std::string GetConnectionString()
{
const static std::string ConnectionString = "";
if (!ConnectionString.empty())
{
return ConnectionString;
}
const static std::string envConnectionString = std::getenv("AZURE_STORAGE_CONNECTION_STRING");
if (!envConnectionString.empty())
{
return envConnectionString;
}
throw std::runtime_error("Cannot find connection string.");
}
int main()
{
using namespace Azure::Storage::Blobs;
const std::string containerName = "sample-container";
const std::string blobName = "sample-blob";
const std::string blobContent = "Hello Azure!";
auto containerClient
= BlobContainerClient::CreateFromConnectionString(GetConnectionString(), containerName);
containerClient.CreateIfNotExists();
BlockBlobClient blobClient = containerClient.GetBlockBlobClient(blobName);
std::vector<uint8_t> buffer(blobContent.begin(), blobContent.end());
blobClient.UploadFrom(buffer.data(), buffer.size());
Azure::Storage::Metadata blobMetadata = {{"key1", "value1"}, {"key2", "value2"}};
blobClient.SetMetadata(blobMetadata);
auto properties = blobClient.GetProperties().Value;
for (auto metadata : properties.Metadata)
{
std::cout << metadata.first << ":" << metadata.second << std::endl;
}
// We know blob size is small, so it's safe to cast here.
buffer.resize(static_cast<size_t>(properties.BlobSize));
blobClient.DownloadTo(buffer.data(), buffer.size());
std::cout << std::string(buffer.begin(), buffer.end()) << std::endl;
return 0;
}
Long story short, CACHING was the solution. Our system is designed such way, that read function always read only 32kb, and then you can imagine the amount of http requests... At first I have tried to download a 1gb locally and then whenever read is called, get a chunk of that 1gb, afterwards I reduced it all way to 4mb, which showed great results. Speed up was insane.