Share via


Configure streaming ingestion on your Azure Data Explorer cluster

Streaming ingestion is useful for loading data when you need low latency between ingestion and query. Consider using streaming ingestion in the following scenarios:

  • Latency of less than a second is required.
  • To optimize operational processing of many tables where the stream of data into each table is relatively small (a few records per second), but the overall data ingestion volume is high (thousands of records per second).

If the stream of data into each table is high (over 4 GB per hour), consider using batch ingestion.

To learn more about different ingestion methods, see data ingestion overview.

Choose the appropriate streaming ingestion type

Two streaming ingestion types are supported:

Ingestion type Description
Data connection Event Hub, IoT Hub, and Event Grid data connections can use streaming ingestion, provided it is enabled on the cluster level. The decision to use streaming ingestion is done according to the streaming ingestion policy configured on the target table.
For information on managing data connections, see Event Hub, IoT Hub and Event Grid.
Custom ingestion Custom ingestion requires you to write an application that uses one of the Azure Data Explorer client libraries.
Use the information in this topic to configure custom ingestion. You may also find the C# streaming ingestion sample application helpful.

Use the following table to help you choose the ingestion type that's appropriate for your environment:

Criterion Data connection Custom Ingestion
Data delay between ingestion initiation and the data available for query Longer delay Shorter delay
Development overhead Fast and easy setup, no development overhead High development overhead to create an application ingest the data, handle errors, and ensure data consistency

Note

You can manage the process to enable and disable streaming ingestion on your cluster using the Azure portal or programmatically in C#. If you are using C# for your custom application, you may find it more convenient using the programmatic approach.

Prerequisites

Performance and operational considerations

The main contributors that can impact streaming ingestion are:

  • VM and cluster size: Streaming ingestion performance and capacity scales with increased VM and cluster sizes. The number of concurrent ingestion requests is limited to six per core. For example, for 16 core SKUs, such as D14 and L16, the maximal supported load is 96 concurrent ingestion requests. For two core SKUs, such as D11, the maximal supported load is 12 concurrent ingestion requests.
  • Data size limit: The data size limit for a streaming ingestion request is 4 MB. This includes any data created for update policies during the ingestion.
  • Schema updates: Schema updates, such as creation and modification of tables and ingestion mappings, may take up to five minutes for the streaming ingestion service. For more information see Streaming ingestion and schema changes.
  • SSD capacity: Enabling streaming ingestion on a cluster, even when data isn't ingested via streaming, uses part of the local SSD disk of the cluster machines for streaming ingestion data and reduces the storage available for hot cache.

Enable streaming ingestion on your cluster

Before you can use streaming ingestion, you must enable the capability on your cluster and define a streaming ingestion policy. You can enable the capability when creating the cluster, or add it to an existing cluster.

Warning

Review the limitations prior to enabling streaming ingestion.

Enable streaming ingestion while creating a new cluster

You can enable streaming ingestion while creating a new cluster using the Azure portal or programmatically in C#.

While creating a cluster using the steps in Create an Azure Data Explorer cluster and database, in the Configurations tab, select Streaming ingestion > On.

Enable streaming ingestion while creating a cluster in Azure Data Explorer.

Enable streaming ingestion on an existing cluster

If you have an existing cluster, you can enable streaming ingestion using the Azure portal or programmatically in C#.

  1. In the Azure portal, go to your Azure Data Explorer cluster.

  2. In Settings, select Configurations.

  3. In the Configurations pane, select On to enable Streaming ingestion.

  4. Select Save.

    Turn on streaming ingestion in Azure Data Explorer.

Create a target table and define the policy

Create a table to receive the streaming ingestion data and define its related policy using the Azure portal or programmatically in C#.

  1. In the Azure portal, navigate to your cluster.

  2. Select Query.

    Select query in the Azure Data Explorer portal to enable streaming ingestion.

  3. To create the table that will receive the data via streaming ingestion, copy the following command into the Query pane and select Run.

    .create table TestTable (TimeStamp: datetime, Name: string, Metric: int, Source:string)
    

    Create a table for streaming ingestion into Azure Data Explorer.

  4. Copy one of the following commands into the Query pane and select Run. This defines the streaming ingestion policy on the table you created or on the database that contains the table.

    Tip

    A policy that is defined at the database level applies to all existing and future tables in the database. When you enable the policy at the database level, there is no need to enable it per table.

    • To define the policy on the table you created, use:

      .alter table TestTable policy streamingingestion enable
      
    • To define the policy on the database containing the table you created, use:

      .alter database StreamingTestDb policy streamingingestion enable
      

    Define the streaming ingestion policy in Azure Data Explorer.

Create a streaming ingestion application to ingest data to your cluster

Create your application for ingesting data to your cluster using your preferred language.

using System.IO;
using System.Threading.Tasks;
using Kusto.Data; // Requires Package Microsoft.Azure.Kusto.Data
using Kusto.Data.Common;
using Kusto.Ingest; // Requires Package Microsoft.Azure.Kusto.Ingest

namespace StreamingIngestion;
class Program
{
    static async Task Main(string[] args)
    {
        var clusterPath = "https://<clusterName>.<region>.kusto.windows.net";
        var appId = "<appId>";
        var appKey = "<appKey>";
        var appTenant = "<appTenant>";
        // Create Kusto connection string with App Authentication
        var connectionStringBuilder = new KustoConnectionStringBuilder(clusterPath)
            .WithAadApplicationKeyAuthentication(
                applicationClientId: appId,
                applicationKey: appKey,
                authority: appTenant
            );
        // Create a disposable client that will execute the ingestion
        using var client = KustoIngestFactory.CreateStreamingIngestClient(connectionStringBuilder);
        // Ingest from a compressed file
        var fileStream = File.Open("MyFile.gz", FileMode.Open);
        // Initialize client properties
        var ingestionProperties = new KustoIngestionProperties(databaseName: "<databaseName>", tableName: "<tableName>");
        // Create source options
        var sourceOptions = new StreamSourceOptions { CompressionType = DataSourceCompressionType.GZip, };
        // Ingest from stream
        await client.IngestFromStreamAsync(fileStream, ingestionProperties, sourceOptions);
    }
}

Disable streaming ingestion on your cluster

Warning

Disabling streaming ingestion may take a few hours.

Before disabling streaming ingestion on your Azure Data Explorer cluster, drop the streaming ingestion policy from all relevant tables and databases. The removal of the streaming ingestion policy triggers data rearrangement inside your Azure Data Explorer cluster. The streaming ingestion data is moved from the initial storage to permanent storage in the column store (extents or shards). This process can take between a few seconds to a few hours, depending on the amount of data in the initial storage.

Drop the streaming ingestion policy

You can drop the streaming ingestion policy using the Azure portal or programmatically in C#.

  1. In the Azure portal, go to your Azure Data Explorer cluster and select Query.

  2. To drop the streaming ingestion policy from the table, copy the following command into Query pane and select Run.

    .delete table TestTable policy streamingingestion
    

    Delete streaming ingestion policy in Azure Data Explorer.

  3. In Settings, select Configurations.

  4. In the Configurations pane, select Off to disable Streaming ingestion.

  5. Select Save.

    Turn off streaming ingestion in Azure Data Explorer.

Limitations

  • Data mappings must be pre-created for use in streaming ingestion. Individual streaming ingestion requests don't accommodate inline data mappings.
  • Extent tags can't be set on the streaming ingestion data.
  • Update policy. The update policy can reference only the newly-ingested data in the source table and not any other data or tables in the database.
  • If streaming ingestion is enabled on a cluster used as a leader for follower databases, streaming ingestion must be enabled on the following clusters as well to follow streaming ingestion data. Same applies whether the cluster data is shared via Data Share.

Next steps