azureazure-sql-databasegrafanaazure-iot-hubazure-stream-analytics

Azure Stream Analytics reports watermark delay


Azure Stream Analytics is used to distribute JSON data incoming from azure IoT hub into tables of SQL database.

Recently there were several large watermark delays which totally ruined the pipeline.

When it works normally it has a delay of 12 seconds which is more or less acceptable.

The overall architecture is as follows: the IoT edge devices send JSON messages when there are some to IoT hub, each message then follows to Stream Analytics as an input, there is a query running on Stream Analytics, the outputs are SQL db tables; the SQL db handles data to Grafana when requested, there are several queries from Grafana dashboards for monitoring purposes. Hence there are writes by SA and read from Grafana (sometimes from Azure Data studio).

Observations of metrics have shown that:

The following characteristics for the services used:

Grafana queries are very simple: read data for a 15 min range. The data sampling time is not frequent. Maximum every 5 seconds.

There are no data conversion or runtime errors in Stream Analytics.

There are dashboards using data from a different DB with 2 CPUs and with data frequency of 1 seconds. They work properly. Their Stream Analytics has no delay even the Grafana query time range may be 12 hours.

Also, the longest running queries analysis in insights of queries in SQL database panel on Azure portal were only for 5-15 minutes but taking too much time.

The Grafana dashboard has 3-6 panels while the working dashboard has >10 panels.

I tracked the following error among outputs of Stream Analytics (I definitely studied the link well and it didn't help):

Encountered error trying to write 1 event(s): Resource ID : 1. The request limit for the database is 400 and has been reached. See 'https://docs.microsoft.com/azure/azure-sql/database/resource-limits-logical-server' for assistance.

The strange thing is the same pipeline works with the same number of Grafana dashboards and number of messages from IoT hub with regular delay of 12 seconds (which also is expected to be less, but at least something). But sometimes I have very bad days with delay (20% of the time).

I saw people have similar problem in other forums but no decent solution. I also couldn't solve using ChatGPT.

There were several attempts reducing the delay by KILL-ing all running DB queries. Most of them from Grafana (previously we also had Grafana always available for other users who queried a lot, now other users are disabled -> seems like Grafana queries became so long).

How can I solve this problem without increasing the number of DTUs (paying more)? Should I increase the number of CPU?


Solution

  • Azure Stream Analytics reports watermark delay due to long-running queries requested by Grafana dashboards.

    Grafana dashboard is used to visualize data from Azure SQL database. Each panel on Grafana dashboard executes query reading data from SQL db.

    The speeding up queries by creating indices helped to make queries run faster. It helped to avoid watermark delays in Stream Analytics.