How do I use the azure databricks dlt pipeline to consume azure Event Hub data

zmsoft 360 Reputation points
2025-03-10T03:52:50.48+00:00

Hi there,

How do I use the azure databricks dlt pipeline to consume azure Event Hub data?


EH_NAME = "myeventhub"
TOPIC = "myeventhub"
KAFKA_BROKER = "{EH_NAMESPACE}.servicebus.windows.net:9093"
GROUP_ID = "group_dev"


raw_kafka_events = (spark.readStream
    .format("kafka")
    .option("subscribe", EH_NAME)
    .option("kafka.bootstrap.servers", KAFKA_BROKER)
    .option("kafka.group.id", GROUP_ID) # Set Kafka consumer group ID
    .option("kafka.security.protocol", "SASL_SSL")
    .option("kafka.sasl.jaas.config", f"kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=\"{EH_CONN_STR}\";")
    # .option("kafka.ssl.endpoint.identification.algorithm", "https")
    # .option("kafka.sasl.mechanism", "PLAIN")
    .option("failOnDataLoss", "false")
    .option("startingOffsets", "earliest")
    .load()
    )

parsed_data = raw_kafka_events.select(col("value").cast("string")).alias("event")
display(parsed_data)

Error log:

kafkashaded.org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient

Caused by: kafkashaded.org.apache.kafka.common.KafkaException: java.lang.IllegalArgumentException: No serviceName defined in either JAAS or Kafka config at kafkashaded.org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:184) at kafkashaded.org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:192) at kafkashaded.org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:81) at kafkashaded.org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105) at kafkashaded.org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:522)

Thanks&Regards,

zmsoft

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,365 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ganesh Gurram 4,965 Reputation points Microsoft External Staff
    2025-03-10T07:12:08.9433333+00:00

    Hi @zmsoft

    The error "No serviceName defined in either JAAS or Kafka config" suggests that the kafka.sasl.service.name parameter is missing in your configuration. Below is the correct way to set up an Azure Databricks Delta Live Tables (DLT) pipeline to consume data from Azure Event Hubs.

    Here are the steps to resolve the Issue:

    Enable Kafka protocol on Azure Event Hubs - Ensure Kafka protocol is enabled in your Event Hubs namespace. The Kafka bootstrap server should be:{EH_NAMESPACE}.servicebus.windows.net:9093

    Store the Event Hub Connection string securely - For security, store the Event Hubs connection string in Databricks secrets:

    databricks secrets put --scope eventhub-secrets

    store the connection string:

    databricks secrets put --scope eventhub-secrets --key eh-connection-string

    Use Delta Live Tables (DLT) to Read from Event Hubs - Update your code to include the kafka.sasl.service.name option:

    import dlt
    from pyspark.sql.functions import col
    from pyspark.sql.types import StringType
    # Read secret from Databricks
    EH_CONN_STR = dbutils.secrets.get(scope="eventhub-secrets", key="eh-connection-string")
    KAFKA_BROKER = "{EH_NAMESPACE}.servicebus.windows.net:9093"
    EH_NAME = "myeventhub"
    @dlt.table(
        comment="Streaming data from Azure Event Hub into Delta Live Tables"
    )
    def eventhub_stream():
        return (
            spark.readStream
            .format("kafka")
            .option("kafka.bootstrap.servers", KAFKA_BROKER)
            .option("subscribe", EH_NAME)
            .option("kafka.security.protocol", "SASL_SSL")
            .option("kafka.sasl.mechanism", "PLAIN")
            .option("kafka.sasl.service.name", "kafka")  # Fix for KafkaAdminClient error
            .option("kafka.sasl.jaas.config", 
                    f'org.apache.kafka.common.security.plain.PlainLoginModule required '
                    f'username="$ConnectionString" password="{EH_CONN_STR}";')
            .option("failOnDataLoss", "false")
            .option("startingOffsets", "earliest")
            .load()
            .select(col("value").cast(StringType()).alias("event_data"))  # Extract message payload
        )
    

    Deploy the Delta Live Tables Pipeline - Go to Databricks Workspace → Workflows → Delta Live Tables. Click Create Pipeline and select the notebook where you defined eventhub_stream(). Set Pipeline Mode (Triggered or Continuous) and start the pipeline.

    Once the pipeline is running, verify the data using: SELECT`` ``* FROM`` LIVE.eventhub_stream;

    For more details refer: Use Azure Event Hubs as a DLT data source

    Hope this helps. Do let us know if you have any further queries.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.