Muokkaa

Jaa


Ingest data from Event Hubs into Azure Synapse Data Explorer

Azure Synapse Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. Azure Synapse Data Explorer offers ingestion (data loading) from Event Hubs, IoT Hubs, and blobs written to blob containers.

Azure Synapse Data Explorer offers ingestion (data loading) from Event Hubs, a big data streaming platform and event ingestion service. Event Hubs can process millions of events per second in near real time. In this article, you create an Event Hubs, connect to it from Azure Synapse Data Explorer and see data flow through the system.

Prerequisites

  • An Azure subscription. Create a free Azure account.

  • Create a Data Explorer pool using Synapse Studio or the Azure portal

  • Create a Data Explorer database.

    1. In Synapse Studio, on the left-side pane, select Data.

    2. Select + (Add new resource) > Data Explorer pool, and use the following information:

      Setting Suggested value Description
      Pool name contosodataexplorer The name of the Data Explorer pool to use
      Name TestDatabase The database name must be unique within the cluster.
      Default retention period 365 The time span (in days) for which it's guaranteed that the data is kept available to query. The time span is measured from the time that data is ingested.
      Default cache period 31 The time span (in days) for which to keep frequently queried data available in SSD storage or RAM, rather than in longer-term storage.
    3. Select Create to create the database. Creation typically takes less than a minute.

  • Create a target table to which Event Hubs will send data

    1. In Synapse Studio, on the left-side pane, select Develop.

    2. Under KQL scripts, Select + (Add new resource) > KQL script. On the right-side pane, you can name your script.

    3. In the Connect to menu, select contosodataexplorer.

    4. In the Use database menu, select TestDatabase.

    5. Paste in the following command, and select Run to create the table.

      .create table TestTable (TimeStamp: datetime, Name: string, Metric: int, Source:string)
      

      Tip

      Verify that the table was successfully created. On the left-side pane, select Data, select the contosodataexplorer more menu, and then select Refresh. Under contosodataexplorer, expand Tables and make sure that the TestTable table appears in the list.

    6. Copy the following command into the window and select Run to map the incoming JSON data to the column names and data types of the table (TestTable).

      .create table TestTable ingestion json mapping 'TestMapping' '[{"column":"TimeStamp", "Properties": {"Path": "$.timeStamp"}},{"column":"Name", "Properties": {"Path":"$.name"}} ,{"column":"Metric", "Properties": {"Path":"$.metric"}}, {"column":"Source", "Properties": {"Path":"$.source"}}]'
      
  • We recommend using a user assigned managed identity or system assigned managed identity for the data connection (optional).

  • A sample app that generates data and sends it to an event hub. Download the sample app to your system.

  • Visual Studio 2019 to run the sample app.

Sign in to the Azure portal

Sign in to the Azure portal.

Create an event hub

Create an event hub by using an Azure Resource Manager template in the Azure portal.

  1. To create an event hub, use the following button to start the deployment. Right-click and select Open in new window, so you can follow the rest of the steps in this article.

    Button to deploy the Resource Manager template to Azure.

    The Deploy to Azure button takes you to the Azure portal.

  2. Select the subscription where you want to create the event hub, and create a resource group named test-hub-rg.

    Create a resource group

  3. Fill out the form with the following information.

    Use defaults for any settings not listed in the following table.

    Setting Suggested value Field description
    Subscription Your subscription Select the Azure subscription that you want to use for your Event Hubs.
    Resource group test-hub-rg Create a new resource group.
    Location West US Select West US for this article. For a production system, select the region that best meets your needs. Create the Event Hubs namespace in the same Location as the Azure Synapse Data Explorer cluster for best performance (most important for Event Hubs namespaces with high throughput).
    Namespace name A unique namespace name Choose a unique name that identifies your namespace. For example, mytestnamespace. The domain name servicebus.windows.net is appended to the name you provide. The name can contain only letters, numbers, and hyphens. The name must start with a letter, and it must end with a letter or number. The value must be between 6 and 50 characters long.
    Event Hubs name test-hub The Event Hubs sits under the namespace, which provides a unique scoping container. The Event Hubs name must be unique within the namespace.
    Consumer group name test-group Consumer groups enable multiple consuming applications to each have a separate view of the event stream.
  4. Select Review + create.

  5. Review the Summary of resources created. Select Create, which acknowledges that you're creating resources in your subscription.

    Screen shot of Azure portal for reviewing and creating Event Hubs namespace, Event Hubs, and consumer group.

  6. Select Notifications on the toolbar to monitor the provisioning process. It might take several minutes for the deployment to succeed, but you can move on to the next step now.

    Notifications icon

Authentication considerations

Depending on the type of identity, you're using to authenticate with the Event Hubs, you might need some other configurations.

  • If you're authenticating with Event Hubs using a user assigned managed identity, go to your Event Hubs > Networking, and then under Allow access from, select All networks and save the changes.

    Screenshot of the Event Hubs networking page, showing the selection of allowing access to all networks.

  • If you're authenticating with the Event Hubs using a system assigned managed identity, go to your Event Hubs > Networking, and then either allow access from all networks or under Allow access from, select Selected networks, select Allow trusted Microsoft services to bypass this firewall and save the changes.

    Screenshot of the Event Hubs networking page, showing the selection of allowing access to trusted services.

Connect to the Event Hubs

Now you connect to the Event Hubs from Data Explorer pool. When this connection is in place, data that flows into the Event Hubs streams to the test table you created earlier in this article.

  1. Select Notifications on the toolbar to verify that the Event Hubs deployment was successful.

  2. Under the Data Explorer pool you created, select Databases > TestDatabase.

    Screenshot of the test database pool, showing select test database.

  3. Select Data connections and Add data connection.

    Select data ingestion and Add data connection.

Create a data connection (Preview)

Fill out the form with the following information, and then select Create.

Screenshot of the data connection pane in Event Hubs.

Setting Suggested value Field description
Data connection name test-hub-connection The name of the connection you want to create in Azure Synapse Data Explorer.
Subscription The subscription ID where the Event Hubs resource is located. This field is autopopulated.
Event Hubs namespace A unique namespace name The name you chose earlier that identifies your namespace.
Event Hubs test-hub The Event Hubs you created.
Consumer group test-group The consumer group defined in the Event Hubs you created.
Event system properties Select relevant properties The Event Hubs system properties. If there are multiple records per event message, the system properties will be added to the first record. When adding system properties, create or update table schema and mapping to include the selected properties.
Compression None The compression type of the Event Hubs messages payload. Supported compression types: None, Gzip.
Managed Identity System-assigned The managed identity used by the Data Explorer cluster for access to read from the Event Hubs.

Note:
When the data connection is created:
- System-assigned identities are automatically created if they don't exist
- The managed identity is automatically assigned the Azure Event Hubs Data Receiver role and is added to your Data Explorer cluster. We recommend verifying that the role was assigned and that the identity was added to the cluster.

Target table

There are two options for routing the ingested data: static and dynamic. For this article, you use static routing, where you specify the table name, data format, and mapping as default values. If the Event Hubs message includes data routing information, this routing information will override the default settings.

  1. Fill out the following routing settings:

    Default routing settings for ingesting data to Event Hubs - Azure Synapse Data Explorer.

    Setting Suggested value Field description
    Table name TestTable The table you created in TestDatabase.
    Data format JSON Supported formats are Avro, CSV, JSON, MULTILINE JSON, ORC, PARQUET, PSV, SCSV, SOHSV, TSV, TXT, TSVE, APACHEAVRO, and W3CLOG.
    Mapping TestMapping The mapping you created in TestDatabase, which maps incoming data to the column names and data types of TestTable. Required for JSON, MULTILINE JSON and AVRO, and optional for other formats.

    Note

    • You don't have to specify all Default routing settings. Partial settings are also accepted.
    • Only events enqueued after you create the data connection are ingested.
  2. Select Create.

Event system properties mapping

Note

  • System properties are supported for json and tabular formats (csv, tsv etc.) and aren't supported on compressed data. When using a non-supported format, the data will still be ingested, but the properties will be ignored.
  • For tabular data, system properties are supported only for single-record event messages.
  • For JSON data, system properties are also supported for multiple-record event messages. In such cases, the system properties are added only to the first record of the event message.
  • For csv mapping, properties are added at the beginning of the record in the order listed in the System properties table.
  • For json mapping, properties are added according to property names in the System properties table.

If you selected Event system properties in the Data Source section of the table, you must include system properties in the table schema and mapping.

Copy the connection string

When you run the sample app listed in Prerequisites, you need the connection string for the Event Hubs namespace.

  1. Under the Event Hubs namespace you created, select Shared access policies, then RootManageSharedAccessKey.

    Shared access policies.

  2. Copy Connection string - primary key. You paste it in the next section.

    Connection string.

Generate sample data

Use the sample app you downloaded to generate data.

Warning

This sample uses connection string authentication to connect to Event Hubs for simplicity of the example. However, hard-coding a connection string into your script requires a very high degree of trust in the application, and carries security risks.

For long-term, secure solutions, use one of these options:

  1. Open the sample app solution in Visual Studio.

  2. In the program.cs file, update the eventHubName constant to the name of your Event Hubs and update the connectionString constant to the connection string you copied from the Event Hubs namespace.

    const string eventHubName = "test-hub";
    // Copy the connection string ("Connection string-primary key") from your Event Hub namespace.
    const string connectionString = @"<YourConnectionString>";
    
  3. Build and run the app. The app sends messages to the Event Hubs, and prints its status every 10 seconds.

  4. After the app has sent a few messages, move on to the next step: reviewing the flow of data into your Event Hubs and test table.

Review the data flow

With the app generating data, you can now see the flow of that data from the Event Hubs to the table in your cluster.

  1. In the Azure portal, under your Event Hubs, you see the spike in activity while the app is running.

    Event Hub graph.

  2. To check how many messages have made it to the database so far, run the following query in your test database.

    TestTable
    | count
    
  3. To see the content of the messages, run the following query:

    TestTable
    

    The result set should look like the following image:

    Message result set.

    Note

    • Azure Synapse Data Explorer has an aggregation (batching) policy for data ingestion, designed to optimize the ingestion process. The default batching policy is configured to seal a batch once one of the following conditions is true for the batch: a maximum delay time of 5 minutes, total size of 1G, or 1000 blobs. Therefore, you may experience a latency. For more information, see batching policy.
    • Event Hub ingestion includes Event Hub response time of 10 seconds or 1 MB.
    • To reduce response time lag, configure your table to support streaming. See streaming policy.

Clean up resources

If you don't plan to use your Event Hubs again, clean up test-hub-rg, to avoid incurring costs.

  1. In the Azure portal, select Resource groups on the far left, and then select the resource group you created.

    If the left menu is collapsed, select Expand button. to expand it.

    Select resource group to delete.

  2. Under test-resource-group, select Delete resource group.

  3. In the new window, type the name of the resource group to delete (test-hub-rg), and then select Delete.

Next steps