azureazure-virtual-machineazure-data-explorerazure-eventhub

What is considered an ingestion request in Azure Data Explorer?


Per the azure docs,

VM and cluster size: Streaming ingestion performance and capacity scales with increased VM and cluster sizes. The number of concurrent ingestion requests is limited to six per core.

My current setup is as follows:

I have a fleet of 26 VMs generating metrics. Each of these VMs establishes a producer client connection to 3 event hubs, sending 1 event / minute.

Next, I have my ADX Db with 3 tables, corresponding to each event hub. I have created a data connection between each event hub and each ADX table.

I'm seeing higher delays in data ingestion than I'd like in this current setup, which is why I was looking into enabling streaming congestion.

Since my ADX cluster uses the basic SKU (E2a_v4), it is configured to have 2 cores, giving me an upper limit of 12 concurrent ingestion requests.

However, I'm still not seeing the sub 2 second latency that streaming ingestion promises. After digging into the docs, I wonder if the problem is that I'm hitting / exceeding the concurrent ingestion requests limit. I want to understand what exactly is considered a ingestion request to better investigate my setup.

Thanks


Solution

  • By default, ingestion request refers to any attempt to ingest data into the cluster, regardless of the source. These requests are processed based on the compute resources of the cluster, with a limit of six concurrent ingestion requests per CPU core. For example, with your Basic SKU cluster (E2a_v4) that has 2 cores, you can handle up to 12 concurrent ingestion requests.

    Therefore, in your specific scenario, you have 26 VMs generating metrics and sending events to 3 Event Hubs at the rate of 1 event per minute per VM and each event hub has a corresponding data connection to one of the tables in your ADX database.

    FYI, it may seem like the ingestion load is relatively light but there are important factors to consider for example- Each Event Hub partition can send data in parallel, and each partition ingestion counts as a separate request. If your Event Hubs have multiple partitions, it could lead to a higher number of concurrent ingestion requests than expected. So, your cluster’s 12-request limit may be insufficient if the ingestion concurrency exceeds this threshold.

    Would request you to first of all enable Azure Data Explorer’s monitoring to first observe any anomaly like throttling or failed ingestion attempt, and if possible, try increasing the number of cores or upgrading to a higher SKU. This will increase your concurrency limit and improve overall performance.

    Please checkout these MS docs for better understanding-