azurecsvazure-blob-storageazure-iot-hubazure-stream-analytics

Azure IoT + Stream Analytics with blob data


we currently try to evaluate whether or not we should port our business logic to Azure IoT Hub.

So far this looks promising but i have a questions about stream analytics.

Lets say we have IoT device in the field that send their data as csv files. Currently our back end has some huge problem to go through this data, analyse it and inject it into our database systems with a decent performance.

I want to try to use Azure for that. If I use IoT hub and wanna send this csv format to the hub. We assume that the csv format is fixed so i can't just port to the d2c communication format.

Can the stream analytics service work with this csv format and can it puts the embedded data into specific tables in a table storage ?

This would be really important. Are there any example of that out there that might clear things up for me ?

I guess Auzre has its libraries for handling csv files. What if we use no csv format but instead another industry standard format that Azure might not know about ?

Hope you can help me here.


Solution

  • Azure Stream Analytics (ASA) does support CSV as input:

    Event serialization format: The serialization format (JSON, CSV, or Avro) of the incoming data stream.

    And yes, it also support Azure Table Storage as output . See the docs

    When you create an ASA job you can upload your csv file to test the query, so you can easily try it out if you create a sample file.

    They have some example csv data on github

    I suggest you create a small Proof of concept based on your sample data.

    If, for some reason (like the data is in an unsupported format), ASA does not fit you can always retrieve the IoT Hub data using different techniques, for example using an EventProcessorHost. This way you have complete control over the data and you can output it using everything you want and it will still be scalable (but of course this depends on the data destination as well). See this post as a rough idea. It seems a bit outdated but the concept is still valid this day.

    The official docs about possible other options to read data from the EventHub can be found here