Monitoring provides insights into the behavior and health of your systems, and helps build a holistic view of the environment, historic trends, correlate diverse factors, and measure changes in performance, consumption, or error rate.
Azure Functions offers built-in integration with Application Insights. From Application Insights, you can get information such as the number of function app instances or request and dependency telemetry of a function. When working with Functions and Event Hubs, Application Insights can also track the outgoing dependency telemetries to the event hub, calculating the processing time and showing the end-to-end flow of the system connected through Event Hubs.
This section introduces useful features and insights that you can get from Application Insights for your Event Hubs plus Functions solution.
Application Map
Application Map shows how the components in a system are interacting with each other. Because of the dependency telemetry that Application Insights provides, it maps out the flow of events between Azure Functions and Event Hubs, including the average of each function execution and average duration of an event in Event Hubs, as well as showing transactions that contain failures marked in red.
After sending the expected load to your system, you can go to Application Insights in the Azure portal, and on the sidebar, choose Application Map. Here's a map that shows three functions, three event hubs, and apparent failures when writing to a downstream database:
End-to-end transaction details
End-to-end transaction details show how your system components interact with each other, in chronological order. This view also shows how long an event has spent in processing. You can also drill into the telemetry of each component from this view, which makes it easier to troubleshoot across components within the same request when an issue occurred.
Platform metrics and telemetry
Platform-generated metrics in Azure Monitor for Event Hubs and Azure Functions can be used for overall monitoring of the solution behavior and health:
Azure Event Hubs metrics in Azure Monitor are of interest to capture useful insights for Event Hubs (like aggregates of Incoming Requests, Outgoing Requests, Throttled Requests, Successful Requests, Incoming Messages, Outgoing Messages, Captured Messages, Incoming Bytes, Outgoing Bytes, Captured Bytes, User Errors).
Azure Functions metrics share many of the metrics from Azure App Service, with the addition of Function Execution Count and Function Execution Units that can be used for understanding utilization and cost of the Consumption plan. Other metrics of interest are Connections, Data In, Data Out, Average Memory Working Set, Thread Count, Requests, and Response Time.
Azure Functions integrates with Application Insights to provide advanced and detailed telemetry and insights into the Functions host and function executions. To learn more, see Analyze Azure Functions telemetry in Application Insights. When using Application Insights to monitor a topology, there are a variety of configurations available. To learn more, see How to configure monitoring for Azure Functions.
The following is an example of extra telemetry for Event Hubs triggered functions generated in the traces table:
Trigger Details: PartionId: 6, Offset: 3985758552064-3985758624640, EnqueueTimeUtc: 2022-10-31T12:51:58.1750000+00:00-2022-10-31T12:52:03.8160000+00:00, SequenceNumber: 3712266-3712275, Count: 10
This information requires that you use Event Hubs extension 4.2.0 or a later version. This data is very useful as it contains information about the message that triggered the function execution and can be used for querying and insights. It includes the following data for each time the function is triggered:
- The partition ID (6)
- The partition offset range (3985758552064-3985758624640)
- The Enqueue Time range in UTC (2022-10-31T12:51:58.1750000+00:00-2022-10-31T12:52:03.8160000+00:00)
- The sequence number range 3712266-3712275
- And the count of messages (10)
Refer to the Example Application Insights queries section for examples on how to use this telemetry.
Custom telemetry is also possible for different languages (C# class library, C# Isolated, C# Script, JavaScript, Java, PowerShell, and Python). This logging shows up in the traces table in Application Insights. You can create your own entries into Application Insights and add custom dimensions that can be used for querying data and creating custom dashboards.
Finally, when your function app connects to an event hub using an output binding, entries are also written to the Application Insights Dependencies table.
For Event Hubs, the correlation is injected into the event payload, and you see a Diagnostic-Id property in events:
This follows the W3C Trace Context format that's also used as Operation Id and Operation Links in telemetry created by Functions, which allows Application Insights to construct the correlation between event hub events and function executions, even when they're distributed.
Example Application Insights queries
Below is a list of helpful Application Insights queries when monitoring Event Hubs with Azure Functions. This query display detailed information for event hub triggered function using telemetry emitted by the Event Hubs extension 4.2.0 and greater.
When sampling is enabled in Application Insights, there can be gaps in the data.
Detailed event processing information
The data is only emitted in the correct format when batched dispatch is used. Batch dispatch means that the function accepts multiple events for each execution, which is recommended for performance. Keep in mind the following considerations:
- The
dispatchTimeMilliseconds
value approximates the length of time between when the event was written to the event hub and when it was picked up by the function app for processing. dispatchTimeMilliseconds
can be negative or otherwise inaccurate because of clock drift between the event hub server and the function app.- Event Hubs partitions are processed sequentially. A message won't be dispatched to function code for processing until all previous messages have been processed. Monitor the execution time of your functions as longer execution times will cause dispatch delays.
- The calculation uses the enqueueTime of the first message in the batch. Dispatch times might be lower for other messages in the batch.
dispatchTimeMilliseconds
is based on the point in time.- Sequence numbers are per-partition, and duplicate processing can occur because Event Hubs does not guarantee exactly-once message delivery.
traces
| where message startswith "Trigger Details: Parti"
| parse message with * "tionId: " partitionId:string ", Offset: "
offsetStart:string "-" offsetEnd:string", EnqueueTimeUtc: "
enqueueTimeStart:datetime "+00:00-" enqueueTimeEnd:datetime "+00:00, SequenceNumber: "
sequenceNumberStart:string "-" sequenceNumberEnd:string ", Count: "
messageCount:int
| extend dispatchTimeMilliseconds = (timestamp - enqueueTimeStart) / 1ms
| project timestamp, cloud_RoleInstance, operation_Name, processId =
customDimensions.ProcessId, partitionId, messageCount, sequenceNumberStart,
sequenceNumberEnd, enqueueTimeStart, enqueueTimeEnd, dispatchTimeMilliseconds
Dispatch latency visualization
This query visualizes the 50th and 90th percentile event dispatch latency for a given event hub triggered function. See the above query for more details and notes.
traces
| where operation_Name == "<ENTER THE NAME OF YOUR FUNCTION HERE>"
| where message startswith "Trigger Details: Parti"
| parse message with * "tionId: " partitionId:string ", Offset: "
offsetStart:string "-" offsetEnd:string", EnqueueTimeUtc: "
enqueueTimeStart:datetime "+00:00-" enqueueTimeEnd:datetime "+00:00, SequenceNumber: "
sequenceNumberStart:string "-" sequenceNumberEnd:string ", Count: "
messageCount:int
| extend dispatchTimeMilliseconds = (timestamp - enqueueTimeStart) / 1ms
| summarize percentiles(dispatchTimeMilliseconds, 50, 90) by bin(timestamp, 5m)
| render timechart
Dispatch latency summary
This query is similar to above but shows a summary view.
traces
| where message startswith "Trigger Details: Parti"
| parse message with * "tionId: " partitionId:string ", Offset: "
offsetStart:string "-" offsetEnd:string", EnqueueTimeUtc: "
enqueueTimeStart:datetime "+00:00-" enqueueTimeEnd:datetime "+00:00, SequenceNumber: "
sequenceNumberStart:string "-" sequenceNumberEnd:string ", Count: "
messageCount:int
| extend dispatchTimeMilliseconds = (timestamp - enqueueTimeStart) / 1ms
| summarize messageCount = sum(messageCount),
percentiles(dispatchTimeMilliseconds, 50, 90, 99, 99.9, 99.99) by operation_Name
Message distribution across partitions
This query shows how to visualize message distribution across partitions.
traces
| where message startswith "Trigger Details: Parti"
| parse message with * "tionId: " partitionId:string ", Offset: "
offsetStart:string "-" offsetEnd:string", EnqueueTimeUtc: "
enqueueTimeStart:datetime "+00:00-" enqueueTimeEnd:datetime "+00:00, SequenceNumber: "
sequenceNumberStart:string "-" sequenceNumberEnd:string ", Count: "
messageCount:int
| summarize messageCount = sum(messageCount) by cloud_RoleInstance,
bin(timestamp, 5m)
| render areachart kind=stacked
Message distribution across instances
This query shows how to visualize message distribution across instances.
traces
| where message startswith "Trigger Details: Parti"
| parse message with * "tionId: " partitionId:string ", Offset: "
offsetStart:string "-" offsetEnd:string", EnqueueTimeUtc: "
enqueueTimeStart:datetime "+00:00-" enqueueTimeEnd:datetime "+00:00, SequenceNumber: "
sequenceNumberStart:string "-" sequenceNumberEnd:string ", Count: "
messageCount:int
| summarize messageCount = sum(messageCount) by cloud_RoleInstance,
bin(timestamp, 5m)
| render areachart kind=stacked
Executing Instances and Allocated Instances
This query shows how to visualize the number of Azure Functions instances that are processing events from Event Hubs, and the total number of instances (processing and waiting for lease). Most of the time they should be the same.
traces
| where message startswith "Trigger Details: Parti"
| summarize type = "Executing Instances", Count = dcount(cloud_RoleInstance) by
bin(timestamp, 60s)
| union (
traces
| summarize type = "Allocated Instances", Count = dcount(cloud_RoleInstance) by
bin(timestamp, 60s)
)
| project timestamp, type, Count
| render timechart
All Telemetry for a specific Function Execution
The operation_Id field can be used across the different tables in Application Insights. For Event Hubs triggered Azure Functions the following query for example will result in the trigger information, telemetry from logs inside the Function code, and dependencies and exceptions:
union isfuzzy=true requests, exceptions, traces, dependencies
| where * has "<ENTER THE OPERATION_ID OF YOUR FUNCTION EXECUTION HERE>"
| order by timestamp asc
End-to-End Latency for an Event
As the enqueueTimeUtc property in the trigger detail trace shows the enqueue time of only the first event of each batch that the function processed, a more advanced query can be used to calculate the end-to-end latency of events between two functions with Event Hubs in between. This query will expand the operation links (if any) in the second function's request and map its end time to the same corresponding operation ID of the first function start time.
let start = view(){
requests
| where operation_Name == "FirstFunction"
| project start_t = timestamp, first_operation_Id = operation_Id
};
let link = view(){
requests
| where operation_Name == "SecondFunction"
| mv-expand ex = parse_json(tostring(customDimensions["_MS.links"]))
| extend parent = case(isnotempty(ex.operation_Id), ex.operation_Id, operation_Id )
| project first_operation_Id = parent, second_operation_Id = operation_Id
};
let finish = view(){
traces
| where customDimensions["EventName"] == "FunctionCompleted" and operation_Name
== "SecondFunction"
| project end_t = timestamp, second_operation_Id = operation_Id
};
start
| join kind=inner (
link
| join kind=inner finish on second_operation_Id
) on first_operation_Id
| project start_t, end_t, first_operation_Id, second_operation_Id
| summarize avg(datetime_diff('second', end_t, start_t))
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal author:
- David Barkol | Principal Solution Specialist GBB
To see non-public LinkedIn profiles, sign in to LinkedIn.
Next steps
To learn more, consider reviewing these related articles:
- Analyze Azure Functions telemetry in Application Insights
- Configure monitoring for Azure Functions
- Analyze Azure Functions telemetry in Application Insights
- Metrics in Azure Monitor - Azure Event Hubs
- Kusto Query Language
- Serverless event processing is a reference architecture detailing a typical architecture of this type, with code samples and discussion of important considerations.