Solution Background: We have devices sending telemetry data in a minute interval into Event Hub where it is stored as AVRO file. For cold path, we are planning to store the data (total storage required to store 3 years data will be 80 TB) into Azure Data Lake Gen2. We will need to query dataset from this data store running queries with filters, time span, etc. from our Web API that serve data to a Angular web app in Azure.
We can query data using query acceleration feature of Azure Data Lake in our Web API project using C# and SQL syntax when data is stored in JSON format in Azure Data Lake. However, to minimize the storage size and for better query performance, it is advised to use Parquet file format while storing data into Azure Data Lake.
Q1: The challenge is, the same .NET SDK (Azure.Storage.Files.DataLake) does not support Parquet file format while querying data or does it?
I also checked ā.NET for Apache Sparkā for big data processing in .NET however it runs on requires JRE and other components installed and only examples I could find is console apps. Not Web API that will be deployed in Azure.
Q2: Does anyone have any idea about this?
Q3: A bit subjective but is there any other way to store & fetch big data using familiar SQL in .NET Web API from Azure Data Lake?
You might look at Parquet.NET as an option for querying Parquet files in .NET.
You might also evaluate Query Acceleration or Azure Data Explorer or Synapse Analytics on-demand SQL (example syntax).