How do I use the azure databricks dlt pipeline to consume azure Event Hub data

Question

How do I use the azure databricks dlt pipeline to consume azure Event Hub data

zmsoft 360

Hi there,

How do I use the azure databricks dlt pipeline to consume azure Event Hub data?


EH_NAME = "myeventhub"
TOPIC = "myeventhub"
KAFKA_BROKER = "{EH_NAMESPACE}.servicebus.windows.net:9093"
GROUP_ID = "group_dev"


raw_kafka_events = (spark.readStream
    .format("kafka")
    .option("subscribe", EH_NAME)
    .option("kafka.bootstrap.servers", KAFKA_BROKER)
    .option("kafka.group.id", GROUP_ID) # Set Kafka consumer group ID
    .option("kafka.security.protocol", "SASL_SSL")
    .option("kafka.sasl.jaas.config", f"kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=\"{EH_CONN_STR}\";")
    # .option("kafka.ssl.endpoint.identification.algorithm", "https")
    # .option("kafka.sasl.mechanism", "PLAIN")
    .option("failOnDataLoss", "false")
    .option("startingOffsets", "earliest")
    .load()
    )

parsed_data = raw_kafka_events.select(col("value").cast("string")).alias("event")
display(parsed_data)

Error log:

kafkashaded.org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient

Caused by: kafkashaded.org.apache.kafka.common.KafkaException: java.lang.IllegalArgumentException: No serviceName defined in either JAAS or Kafka config at kafkashaded.org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:184) at kafkashaded.org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:192) at kafkashaded.org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:81) at kafkashaded.org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105) at kafkashaded.org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:522)

Thanks&Regards,

zmsoft

Ganesh Gurram 4,965 Reputation points Microsoft External Staff

2025-03-11T10:21:00.5266667+00:00

@zmsoft - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

Ganesh Gurram 4,965 Reputation points Microsoft External Staff

2025-03-11T10:21:00.5266667+00:00

@zmsoft - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Hi @zmsoft

The error "No serviceName defined in either JAAS or Kafka config" suggests that the kafka.sasl.service.name parameter is missing in your configuration. Below is the correct way to set up an Azure Databricks Delta Live Tables (DLT) pipeline to consume data from Azure Event Hubs.

Here are the steps to resolve the Issue:

Enable Kafka protocol on Azure Event Hubs - Ensure Kafka protocol is enabled in your Event Hubs namespace. The Kafka bootstrap server should be:{EH_NAMESPACE}.servicebus.windows.net:9093

Store the Event Hub Connection string securely - For security, store the Event Hubs connection string in Databricks secrets:

databricks secrets put --scope eventhub-secrets

store the connection string:

databricks secrets put --scope eventhub-secrets --key eh-connection-string

Use Delta Live Tables (DLT) to Read from Event Hubs - Update your code to include the kafka.sasl.service.name option:

import dlt
from pyspark.sql.functions import col
from pyspark.sql.types import StringType
# Read secret from Databricks
EH_CONN_STR = dbutils.secrets.get(scope="eventhub-secrets", key="eh-connection-string")
KAFKA_BROKER = "{EH_NAMESPACE}.servicebus.windows.net:9093"
EH_NAME = "myeventhub"
@dlt.table(
    comment="Streaming data from Azure Event Hub into Delta Live Tables"
)
def eventhub_stream():
    return (
        spark.readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", KAFKA_BROKER)
        .option("subscribe", EH_NAME)
        .option("kafka.security.protocol", "SASL_SSL")
        .option("kafka.sasl.mechanism", "PLAIN")
        .option("kafka.sasl.service.name", "kafka")  # Fix for KafkaAdminClient error
        .option("kafka.sasl.jaas.config", 
                f'org.apache.kafka.common.security.plain.PlainLoginModule required '
                f'username="$ConnectionString" password="{EH_CONN_STR}";')
        .option("failOnDataLoss", "false")
        .option("startingOffsets", "earliest")
        .load()
        .select(col("value").cast(StringType()).alias("event_data"))  # Extract message payload
    )

Deploy the Delta Live Tables Pipeline - Go to Databricks Workspace → Workflows → Delta Live Tables. Click Create Pipeline and select the notebook where you defined eventhub_stream(). Set Pipeline Mode (Triggered or Continuous) and start the pipeline.

Once the pipeline is running, verify the data using: SELECT`` ``* FROM`` LIVE.eventhub_stream;

For more details refer: Use Azure Event Hubs as a DLT data source

Connect your Apache Spark application with Azure Event Hubs

Hope this helps. Do let us know if you have any further queries.

Ganesh Gurram 4,965 Reputation points Microsoft External Staff

2025-03-12T04:30:11.5366667+00:00

@zmsoft - Following up to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

How do I use the azure databricks dlt pipeline to consume azure Event Hub data

1 answer

Your answer