Recommended configurations for Apache Kafka clients
Here are the recommended configurations for using Azure Event Hubs from Apache Kafka client applications.
Java client configuration properties
Producer and consumer configurations
Property | Recommended values | Permitted range | Notes |
---|---|---|---|
metadata.max.age.ms |
180000 (approximate) | < 240000 | Can be lowered to pick up metadata changes sooner. |
connections.max.idle.ms |
180000 | < 240000 | Azure closes inbound Transmission Control Protocol (TCP) idle > 240,000 ms, which can result in sending on dead connections (shown as expired batches because of send timeout). |
Producer configurations only
Producer configs can be found here.
Property | Recommended Values | Permitted Range | Notes |
---|---|---|---|
max.request.size |
1000000 | < 1046528 | The service closes connections if requests larger than 1,046,528 bytes are sent. This value must be changed and causes issues in high-throughput produce scenarios. |
retries |
> 0 | Might require increasing delivery.timeout.ms value, see documentation. | |
request.timeout.ms |
30000 .. 60000 | > 20000 | Event Hubs internally defaults to a minimum of 20,000 ms. While requests with lower timeout values are accepted, client behavior isn't guaranteed. Make sure that your request.timeout.ms is at least the recommended value of 60000 and your session.timeout.ms is at least the recommended value of 30000. Having these settings too low could cause consumer timeouts, which then cause rebalances (which then cause more timeouts, which cause more rebalancing, and so on). |
metadata.max.idle.ms |
180000 | > 5000 | Controls how long the producer caches metadata for a topic that's idle. If the elapsed time since a topic was last produced exceeds the metadata idle duration, then the topic's metadata is forgotten and the next access to it will force a metadata fetch request. |
linger.ms |
> 0 | For high throughput scenarios, linger value should be equal to the highest tolerable value to take advantage of batching. | |
delivery.timeout.ms |
Set according to the formula (request.timeout.ms + linger.ms ) * retries . |
||
compression.type |
uncompressed, gzip |
Only gzip compression is currently supported. |
Consumer configurations only
Consumer configs can be found here.
Property | Recommended Values | Permitted Range | Notes |
---|---|---|---|
heartbeat.interval.ms |
3000 | 3000 is the default value and shouldn't be changed. | |
session.timeout.ms |
30000 | 6000 .. 300000 | Start with 30000, increase if seeing frequent rebalancing because of missed heartbeats. Make sure that your request.timeout.ms is at least the recommended value of 60000 and your session.timeout.ms is at least the recommended value of 30000. Having these settings too low could cause consumer timeouts, which then cause rebalances (which then cause more timeouts, which cause more rebalancing, and so on). |
max.poll.interval.ms |
300000 (default) | >session.timeout.ms | Used for rebalance timeout, so it shouldn't be set too low. Must be greater than session.timeout.ms. |
librdkafka configuration properties
The main librdkafka
configuration file (link) contains extended descriptions for the properties described in the following sections.
Producer and consumer configurations
Property | Recommended Values | Permitted Range | Notes |
---|---|---|---|
socket.keepalive.enable |
true | Necessary if connection is expected to idle. Azure closes inbound TCP idle > 240,000 ms. | |
metadata.max.age.ms |
~ 180000 | < 240000 | Can be lowered to pick up metadata changes sooner. |
Producer configurations only
Property | Recommended Values | Permitted Range | Notes |
---|---|---|---|
retries |
> 0 | Default is 2147483647. | |
request.timeout.ms |
30000 .. 60000 | > 20000 | Event Hubs internally defaults to a minimum of 20,000 ms. librdkafka default value is 5000, which can be problematic. While requests with lower timeout values are accepted, client behavior isn't guaranteed. |
partitioner |
consistent_random |
See librdkafka documentation | consistent_random is default and best. Empty and null keys are handled ideally for most cases. |
compression.codec |
none, gzip |
Only gzip compression is currently supported. |
Consumer configurations only
Property | Recommended Values | Permitted Range | Notes |
---|---|---|---|
heartbeat.interval.ms |
3000 | 3000 is the default value and shouldn't be changed. | |
session.timeout.ms |
30000 | 6000 .. 300000 | Start with 30000, increase if seeing frequent rebalancing because of missed heartbeats. |
max.poll.interval.ms |
300000 (default) | >session.timeout.ms | Used for rebalance timeout, so it shouldn't be set too low. Must be greater than session.timeout.ms. |
Further notes
Check the following table of common configuration-related error scenarios.
Symptoms | Problem | Solution |
---|---|---|
Offset commit failures because of rebalancing | Your consumer is waiting too long in between calls to poll() and the service is kicking the consumer out of the group. | You have several options:
|
Network exceptions at high produce throughput | If you're using Java client + default max.request.size, your requests might be too large. | See Java configs mentioned earlier. |
Next steps
See Azure subscription and service limits, quotas, and constraints for quotas and limits of all Azure services.