Stream data from Kafka into Azure Stream Analytics
Kafka is a distributed streaming platform used to publish and subscribe to streams of records. Kafka is designed to allow your apps to process records as they occur. It's an open-source system developed by the Apache Software Foundation and written in Java and Scala.
The following are the major use cases:
- Messaging
- Website Activity Tracking
- Metrics
- Log Aggregation
- Stream Processing
Azure Stream Analytics lets you connect directly to Kafka clusters to ingest data. The solution is low code and entirely managed by the Azure Stream Analytics team at Microsoft, allowing it to meet business compliance standards. The Kafka input is backward compatible and supports all versions with the latest client release starting from version 0.10. Users can connect to Kafka clusters inside a virtual network and Kafka clusters with a public endpoint, depending on the configurations. The configuration relies on existing Kafka configuration conventions. Supported compression types are None, Gzip, Snappy, LZ4, and Zstd.
Steps
This article shows how to set up Kafka as an input source for Azure Stream Analytics. There are six steps:
- Create an Azure Stream Analytics job.
- Configure your Azure Stream Analytics job to use managed identity if you're using mTLS or SASL_SSL security protocols.
- Configure Azure Key vault if you're using mTLS or SASL_SSL security protocols.
- Upload certificates as secrets into Azure Key vault.
- Grant Azure Stream Analytics permissions to access the uploaded certificate.
- Configure Kafka input in your Azure Stream Analytics job.
Note
Depending on how your Kafka cluster is configured and the type of Kafka cluster you are using, some of the above steps may not apply to you. Examples are: if you are using confluent cloud Kafka, you will not need to upload a certificate to use the Kafka connector. If your Kafka cluster is inside a virtual network (VNET) or behind a firewall, you may have to configure your Azure Stream Analytics job to access your Kafka topic using a private link or a dedicated networking configuration.
Configuration
The following table lists the property names and their description for creating a Kafka Input:
Important
To configure your Kafka cluster as an input, the timestamp type of the input topic should be LogAppendTime. The only timestamp type Azure Stream Analytics supports is LogAppendTime. Azure Stream Analytics supports only numerical decimal format.
Property name | Description |
---|---|
Input/Output Alias | A friendly name used in queries to reference your input or output |
Bootstrap server addresses | A list of host/port pairs to establish the connection to the Kafka cluster. |
Kafka topic | A named, ordered, and partitioned stream of data that allows for the publish-subscribe and event-driven processing of messages. |
Security Protocol | How you want to connect to your Kafka cluster. Azure Stream Analytics supports mTLS, SASL_SSL, SASL_PLAINTEXT, or None. |
Consumer Group ID | The name of the Kafka consumer group that the input should be a part of. It's automatically assigned if not provided. |
Event Serialization format | The serialization format (JSON, CSV, Avro, Parquet, Protobuf) of the incoming data stream. |
Authentication and encryption
You can use four types of security protocols to connect to your Kafka clusters:
Property name | Description |
---|---|
mTLS | Encryption and authentication. Supports PLAIN, SCRAM-SHA-256, and SCRAM-SHA-512 security mechanisms. |
SASL_SSL | It combines two different security mechanisms - SASL (Simple Authentication and Security Layer) and Secure Sockets Layer (SSL) - to ensure both authentication and encryption are in place for data transmission. The SASL_SSL protocol supports PLAIN, SCRAM-SHA-256, and SCRAM-SHA-512 security mechanisms. |
SASL_PLAINTEXT | standard authentication with username and password without encryption |
None | No authentication and encryption. |
Important
Confluent Cloud supports authentication using API Keys, OAuth, or SAML single sign-on (SSO). Azure Stream Analytics doesn't support OAuth or SAML single sign-on (SSO) authentication. You can connect to the confluent cloud using an API Key with topic-level access via the SASL_SSL security protocol.
For a step-by-step tutorial on connecting to confluent cloud Kafka, visit the documentation:
- Confluent cloud kafka input: Stream data from confluent cloud Kafka with Azure Stream Analytics
- Confluent cloud kafka output: Stream data from Azure Stream Analytics into confluent cloud
Key vault integration
Note
When using trust store certificates with mTLS or SASL_SSL security protocols, you must have Azure Key vault and managed identity configured for your Azure Stream Analytics job. Check your key vault's network settings to ensure Allow public access from all networks is selected. Suppose your Key vault is in a VNET or only allows access from specific networks. In that case, you must inject your ASA job into a VNET containing the key vault or inject your ASA job into a VNET, then connect your key vault to the VNET containing the job using service endpoints.
Azure Stream Analytics integrates seamlessly with Azure Key vault to access stored secrets needed for authentication and encryption when using mTLS or SASL_SSL security protocols. Your Azure Stream Analytics job connects to your Azure Key vault using managed identity to ensure a secure connection and avoid the exfiltration of secrets. Certificates are stored as secrets in the key vault and must be in PEM format.
Configure Key vault with permissions
You can create a key vault resource by following the documentation Quickstart: Create a key vault using the Azure portal You must have "Key Vault Administrator" access to your Key vault to upload certificates. Follow the following to grant admin access:
Note
You must have "Owner" permissions to grant other key vault permissions.
Select Access control (IAM).
Select Add > Add role assignment to open the Add role assignment page.
Assign the role using the following configuration:
Setting | Value |
---|---|
Role | Key Vault Administrator |
Assign access to | User, group, or service principal |
Members | <Your account information or email> |
Upload Certificate to Key vault via Azure CLI
Important
You must have "Key Vault Administrator" permissions access to your Key vault for this command to work properly You must upload the certificate as a secret. You must use Azure CLI to upload certificates as secrets to your key vault. Your Azure Stream Analytics job will fail when the certificate used for authentication expires. To resolve this, you must update/replace the certificate in your key vault and restart your Azure Stream Analytics job.
Make sure you have Azure CLI configured locally with PowerShell. You can visit this page to get guidance on setting up Azure CLI: Get started with Azure CLI
Login to Azure CLI:
az login
Connect to your subscription containing your key vault:
az account set --subscription <subscription name>
The following command can upload the certificate as a secret to your key vault:
The <your key vault>
is the name of the key vault you want to upload the certificate to. <name of the secret>
is any name you want to give to your secret and how it shows up in the key vault. <file path to certificate>
is the path to where the certificate your certificate is located. You can right-click and copy the path to the certificate.
az keyvault secret set --vault-name <your key vault> --name <name of the secret> --file <file path to certificate>
For example:
az keyvault secret set --vault-name mykeyvault --name kafkasecret --file C:\Users\Downloads\certificatefile.pem
Configure Managed identity
Azure Stream Analytics requires you to configure managed identity to access key vault. You can configure your ASA job to use managed identity by navigating to the Managed Identity tab on the left side under Configure.
- Select managed identity tab under configure.
- Select on Switch Identity and select the identity to use with the job: system-assigned identity or user-assigned identity.
- For user-assigned identity, select the subscription where your user-assigned identity is located and select the name of your identity.
- Review and save.
Grant the Stream Analytics job permissions to access the certificate in the key vault
For your Azure Stream Analytics job to read the secret in your key vault, the job must have permission to access the key vault. Use the following steps to grant special permissions to your stream analytics job:
Select Access control (IAM).
Select Add > Add role assignment to open the Add role assignment page.
Assign the role using the following configuration:
Setting | Value |
---|---|
Role | Key vault secrets user |
Managed identity | Stream Analytics job for System-assigned managed identity or User-assigned managed identity |
Members | <Name of your Stream Analytics job> or <name of user-assigned identity> |
Virtual network integration
If your Kafka cluster is inside a virtual network or behind a firewall, configure your Azure Stream Analytics job to access your Kafka topic using a private link or a dedicated networking configuration. Visit the Run your Azure Stream Analytics job in an Azure Virtual Network documentation for more information.
Limitations
- When configuring your Azure Stream Analytics jobs to use Virtual Network/SWIFT, your job must be configured with at least six (6) streaming units or one (1) V2 streaming unit.
- When using mTLS or SASL_SSL with Azure Key vault, you must convert your Java Key Store to PEM format.
- The minimum version of Kafka you can configure Azure Stream Analytics to connect to be version 0.10.
- Azure Stream Analytics doesn't support authentication to confluent cloud using OAuth or SAML single sign-on (SSO). You must use API Key via the SASL_SSL protocol
Note
For direct help with using the Azure Stream Analytics Kafka input, please reach out to askasa@microsoft.com.