Hey folks I am working on use case where I am implementing updated/incremental updates to delta tables through event hubs in Azure cloud. I came across event hubs and delta live tables which would be necessary. I have an HVR agent at start which will fetch continues data from various data sources. The event hub will read the data and land the data to the delta live tables and further to the delta tables which will act as source to pipelines.
Below are the scenarios which are to be covered.
Could you please help me out to resolve my scenarios.
Yes, Delta Live Tables (DLT) will fulfill that requirements. For streaming live tables DLT uses under the hood Spark Structured Streaming that guarantees:
3rd requirement isn't very clear - is it about to consume data from the begin of the topic? Then yes, it's possible.
Please note that you can't use EventHubs Spark connector directly as DLT right now doesn't allow installation of the external jars, but you can do it using built-in Kafka connector that is part of DLT runtime. This answer shows how to do that.