Process and route data with dataflows

Important

Azure IoT Operations Preview – enabled by Azure Arc is currently in preview. You shouldn't use this preview software in production environments.

You'll need to deploy a new Azure IoT Operations installation when a generally available release is made available. You won't be able to upgrade a preview installation.

See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Dataflows allow you to connect various data sources and perform data operations, simplifying the setup of data paths to move, transform, and enrich data. The dataflow component is part of Azure IoT Operations, which is deployed as an Azure Arc extension. The configuration for a dataflow is done via Kubernetes custom resource definitions (CRDs).

You can write configurations for various use cases, such as:

  • Transform data and send it back to MQTT
  • Transform data and send it to the cloud
  • Send data to the cloud or edge without transformation

Dataflows aren't limited to the region where the IoT Operations instance is deployed. You can use dataflows to send data to cloud endpoints in different regions.

Key features

Here are the key features of dataflows.

Data processing and routing

Dataflows enable the ingestion, processing, and routing of the messages to specified sinks. You can specify:

  • Sources: Where messages are ingested from
  • Destinations: Where messages are drained to
  • Transformations (optional): Configuration for data processing operations

Transformation capabilities

Transformations can be applied to data during the processing stage to perform various operations. These operations can include:

  • Compute new properties: Based on existing properties in the message
  • Rename properties: To standardize or clarify data
  • Convert units: Convert values to different units of measurement
  • Standardize values: Scale property values to a user-defined range
  • Contextualize data: Add reference data to messages for enrichment and driving insights

Configuration and deployment

The configuration is specified by using Kubernetes CRDs. Based on this configuration, the dataflow operator creates dataflow instances to ensure high availability and reliability.

Benefits

  • Simplified setup: Easily connect data sources and destinations.
  • Flexible transformations: Perform a wide range of data operations.
  • Scalable configuration: Use Kubernetes CRDs for scalable and manageable configurations.
  • High availability: Kubernetes native resource ensures reliability.

By using dataflows, you can efficiently manage your data paths. You can ensure that data is accurately sent, transformed, and enriched to meet your operational needs.