azure-functionsazure-data-factoryazure-logic-appsazure-eventgrideventhub

Processing Event Hub XML file to SQL Server table - tools


We have client ERP solution which sends XML file data in Azure Event Hub. We need to process that XML file data & store it in a SQL Server table. The ERP system sends multiple files (1000's files with different schemas) per seconds into Event Hub.

We did a POC's with 2 approaches:

  1. Azure Function with custom Web API's along with stored procedures to do that but it has performance issue for failed cases

  2. Event HUB with Logic App - logging & error handling can be critical with this approach.

We don't have to use blob storage or Azure Data Lake.

Please suggest some best approach and links, or any tools which will help us.


Solution

  • When dealing with

    1. High-throughput scenarios (thousands of files per second)
    2. A varying schemas from Azure Event Hub
    3. Storing data in SQL Server

    Performance, flexibility & reliability are key factors.

    Recommended Approach: Azure Functions with Event Hub Trigger

    Why Azure Functions with Event Hub Trigger is Better:

    Example: (isolated worker model)

     private readonly ILogger<EventHubsFunction> _logger;
    
        public EventHubsFunction(ILogger<EventHubsFunction> logger)
        {
            _logger = logger;
        }
    
        [Function(nameof(EventHubFunction))]
        [FixedDelayRetry(5, "00:00:10")]
        [EventHubOutput("dest", Connection = "EventHubConnection")]
        public string EventHubFunction(
            [EventHubTrigger("src", Connection = "EventHubConnection")] string[] input,
            FunctionContext context)
        {
            _logger.LogInformation("First Event Hubs triggered message: {msg}", input[0]);
    
            var message = $"Output message created at {DateTime.Now}";
            return message;
        }
    

    See Azure Event Hubs trigger for Azure Functions

    Additionally, Azure Functions can be used in combination with Durable Functions to handle long-running processes and complex orchestrations. This approach can help with a custom retry and error-handling mechanism if required.

    Microsoft has guidlnes on resilient design for Event Hubs and Azure Functions: It covers key measures to ensure robust event streaming solutions. It discusses error handling, designing for idempotency, and managing retries to handle large data volumes effectively.