How streaming data is processed in real time projects , which was ingested through azure event hubs

Question

And I have few questions :

How streaming data is processed in real time projects?, lets say my streaming data is stored in adls gen-2, where every minute a new blob file is created.
Does it should be treated as batch data and should process incrementally? or can i use spark streaming concept or trigger pipeline using even based trigeer- after every blob creation?

Accepted Answer

Hello Prakash , Welcome to MS Q&A

To process streaming data stored in Azure Data Lake Storage Gen2, you have several options depending on your requirements for latency and processing complexity:

Batch Processing with Incremental Loads: This approach treats the data as batch data, processing it incrementally. It's suitable if real-time processing isn't critical and some latency is acceptable.

Real-Time Processing with Spark Streaming: Apache Spark Streaming can be used to process data in real-time. It reads data from ADLS Gen2 as it arrives, allowing for low-latency processing and immediate reaction to events.

Event-Driven Processing with Azure Functions or Logic Apps: You can set up an event-based trigger that activates a pipeline or function every time a new blob is created. Azure Event Grid can trigger Azure Functions or Logic Apps to process the new data, which is useful for event-driven architectures.

For more detailed guidance, you can refer to the following resources:

Please let me know if any further ques

Kindly accept answer if it helps

Thanks

Deepanshu

Share via

How streaming data is processed in real time projects , which was ingested through azure event hubs

0 additional answers

Your answer