Publishing to event hub via rest API is slow from ADF web activity

Himanshu Zinzuwadia 45 Reputation points
2024-11-16T04:06:04.1233333+00:00

Publishing to message via rest api event hub using managed identity from dedicated integration runtime. Event Hub is premium with auto scale and there is enough TU. Messages are being sent one at a time. Each call takes 3 to 15 seconds to publish message via REST API.

We can redesign to use function apps and SDK, we don't have this problem. But we need to be able to publish to Event hub from ADF for some use cases. There is no dedicated activity or dataset destination for Event Hub. Using Web activity to send message to event hub via REST API is the only option.

Why is the REST API to publish to event hub slow?

Azure API Management
Azure API Management
An Azure service that provides a hybrid, multi-cloud management platform for APIs.
2,238 questions
Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
663 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,049 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 14,551 Reputation points
    2024-11-17T02:32:09.0066667+00:00

    Hello Himanshu Zinzuwadia,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are publishing to event hub via rest API, and this is significantly slow from ADF web activity.

    Firstly! Publishing messages to Azure Event Hub via REST API can be slow due to several factors. The HTTP overhead involved in establishing connections and sending headers adds latency, especially when messages are sent one at a time. Serialization and deserialization of messages also contribute to delays. Using managed identity for authentication requires obtaining a token from Azure Active Directory, which introduces additional latency. Network conditions, including congestion and physical distance between services, can further affect performance. Even with sufficient Throughput Units (TUs), Event Hub may throttle requests under heavy load. Additionally, the Web Activity in Azure Data Factory (ADF) is not optimized for high-frequency, low-latency operations, adding to the overall delay. https://techcommunity.microsoft.com/discussions/azuredatafactory/azure-data-factory-web-api-call-to-azure-rest-api-slow/4287418

    Now, if you would like to improve performance, consider batching multiple messages into a single API call, reducing the number of HTTP requests. Using Azure Functions with Event Hub SDK for critical low-latency scenarios can be beneficial, and these functions can be triggered from ADF. Ensuring that ADF and Event Hub are in the same region minimizes network latency, and monitoring for network issues can help identify and resolve delays. Using Azure Monitor to track performance and adjusting the number of TUs or scaling Event Hub based on monitored data can optimize performance. Implementing retry logic in ADF Web Activity can handle transient failures and reduce the impact of occasional slow responses. For some troubleshooting guide: https://learn.microsoft.com/en-us/azure/event-hubs/troubleshooting-guide

    You might like to review the code below on how you can batch messages in a single API call using Python - Please, this is an alternative not really compulsory.

    import requests
    import json
    from azure.identity import ManagedIdentityCredential
    # Obtain a token using managed identity
    credential = ManagedIdentityCredential()
    token = credential.get_token("https://eventhubs.azure.net/.default")
    # Event Hub REST API endpoint
    event_hub_url = "https://<your-event-hub-namespace>.servicebus.windows.net/<your-event-hub>/messages"
    # Batch messages
    messages = [
        {"body": "Message 1"},
        {"body": "Message 2"},
        {"body": "Message 3"}
    ]
    # Send batch of messages
    headers = {
        "Authorization": f"Bearer {token.token}",
        "Content-Type": "application/json"
    }
    response = requests.post(event_hub_url, headers=headers, data=json.dumps(messages))
    if response.status_code == 201:
        print("Messages sent successfully")
    else:
        print(f"Failed to send messages: {response.status_code} - {response.text}")
    

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.