Automotive messaging, data, and analytics

Microsoft Fabric
Azure Data Explorer
Azure Event Grid
Azure Event Hubs
Azure Functions

This example architecture explains how automotive original equipment manufacturers (OEMs) and mobility providers can develop advanced connected vehicle applications and digital services. It provides reliable messaging, data, and analytics infrastructure. This infrastructure includes message and command processing, state storage, and managed API integration. The architecture also provides a scalable, enhanced-security data solution for digital engineering, fleet operations, and sharing within the wider mobility ecosystem.

Architecture

Diagram of the high-level architecture.

Download a PowerPoint file that contains this architecture diagram.

The preceding high-level architecture diagram shows the main logical blocks and services of an automotive messaging, data, and analytics solution. In this article, we don't discuss the shaded diagram elements. But the following list briefly explains the other diagram elements. You can find further details in the sections that follow.

  • Vehicle: Each vehicle contains a collection of devices. Some of these devices are software-defined and can run software workloads managed from the cloud. The vehicle collects and processes a wide variety of data, such as sensor information from electro-mechanical devices, interactions, video, and software log files.

  • Mobile devices: Mobile devices provide digital experiences to the driver or user and can receive messages from and send messages to the vehicles by using companion apps.

  • Mobility infrastructure: Mobility infrastructure, such as battery charging stations, receives messages from and sends messages to the vehicles.

  • Messaging services: Messaging services manage the communication to and from the vehicle, infrastructure, and mobile devices. They process messages, use workflows to carry out commands, and implement the management backend. They also track certificate registration and provisioning for all participants.

  • Vehicle and device management backend: OEM systems manage the vehicle and device lifecycle from factory to after-sales support.

  • Data and analytics services: Data and analytics services provide data storage, processing, and analytics capabilities for all users. These services transform data into insights that drive better business decisions.

  • Digital services: The vehicle manufacturer provides digital services that add value for the customer. These services include companion apps for repair and maintenance tasks.

  • Business integration: Several digital services require business integration to backend systems such as dealer management system (DMS), customer relationship management (CRM), or enterprise resource planning (ERP) systems.

  • Consent management: The consent management backend is part of customer management and tracks user authorization for data collection according to applicable legislation.

  • Digital engineering: Digital engineering systems use vehicle data to continuously improve hardware and software through analytics and machine learning.

  • Smart mobility ecosystem: The smart mobility ecosystem consists of partner companies that provide other products and services, such as connected insurance based on user consent. They can subscribe to and consume events and aggregated insights.

  • IT and operations: IT operators use these services to maintain the availability and performance of both vehicles and backend systems.

  • Vehicle security operations center (VSOC): IT operators and engineers use VSOC to protect vehicles from threats.

Microsoft is a member of the Eclipse Software Defined Vehicle Working Group, which serves as a forum for open collaboration on vehicle software platforms that use open source.

Dataflow

The architecture uses the Publisher-Subscriber messaging pattern to decouple vehicles from services. It uses Azure Event Grid to enable messaging between vehicles and services and to route message queuing telemetry transport (MQTT) messages to Azure services.

Vehicle-to-cloud messages

The vehicle-to-cloud dataflow processes telemetry data from the vehicle. Telemetry data, such as vehicle state and sensor data, can be sent periodically. You can send data based on events, like triggers on error conditions, as a reaction to user actions, or as a response to remote requests.

Diagram of the messaging dataflow.

  1. API Management provides secure access to the vehicle, device, and user consent management service. The vehicle is configured for a customer based on their purchase options. The managed APIs provide access to:

    1. Provisioning information for vehicles and devices.

    2. Initial vehicle data collection configuration based on market and business considerations.

    3. Storage of initial user consent settings based on vehicle options and user acceptance defined in the consent management backend.

  2. The vehicle publishes telemetry and events messages through an MQTT client with defined topics to the Event Grid MQTT broker feature in the vehicle messaging services.

  3. The Event Grid routes messages to different subscribers based on topic, message attributes, or payload. For more information, see Filtering of MQTT-routed messages.

    1. An Azure Event Hubs instance buffers high-volume, low-priority messages that don’t require immediate processing, like those only used for analytics. Then it routes the messages directly to storage. For performance reasons, don't use payload filtering for these messages.

    2. An Event Hubs instance buffers high-priority messages that require immediate processing, like status changes in a user-facing application with low-latency expectations. Then it routes them to an Azure function.

  4. The system stores low-priority messages directly in a lakehouse by using event capture. To optimize costs, these messages can use batch decoding and processing.

  5. An Azure function processes high-priority messages. The function reads the vehicle, device, and user consent settings from the device registry and performs the following steps:

    1. Verifies that the vehicle and device are registered and active.

    2. Verifies that the user gave consent for the message topic.

    3. Decodes and enriches the payload.

    4. Adds more routing information.

  6. The live telemetry Eventstream in the data and analytics solution receives the decoded messages. Eventhouse processes and stores messages as they come in.

  7. The digital services layer receives the decoded messages. Azure Service Bus notifies applications about important changes and events about the state of the vehicle. Eventhouse provides the last known state of the vehicle and the short term history.

Cloud-to-vehicle messages

Broadcast dataflow

Digital services use the broadcast dataflow to provide notifications or messages to multiple vehicles about a common topic. Typical examples include traffic and weather services.

Diagram of the data analytics.

  1. The notification service is an MQTT client that runs in the cloud. It's registered and authorized to publish messages to specific topics in Event Grid. The authorization can be done through Microsoft Entra JSON Web Token authentication.

  2. The notification service publishes a message. For example, a weather warning to topic /weather/warning/.

  3. Event Grid verifies if the service is authorized to publish to the provided topic.

  4. The vehicle messaging module is subscribed to the weather alerts and receives the notification.

  5. The messaging module notifies a vehicle workload. For example, it notifies the infotainment system to display the content of the weather alert.

Command and control dataflow

The command and control dataflow performs remote commands in the vehicle from a digital service such as a companion app or communication with mobility infrastructure. These commands include use cases such as locking or unlocking the doors, setting climate control for the cabin, charging the battery, and making configuration changes. The success of these commands depends on the state of the vehicle. They might require some time to complete.

Vehicle commands often require user consent because they control vehicle functionality. These commands use the vehicle state to store intermediate results and evaluate successful execution. The messaging solution must have command workflow logic that checks user consent, tracks the command execution state, and notifies the digital service when the command is complete.

The following dataflow uses commands issued from a companion app digital service as an example. As in the previous example, companion app is an authenticated service that can publish messages to Event Grid.

Diagram of the command and control dataflow.

  1. API Management provides access to the vehicle, device, and consent management backend. The vehicle owner or user grants consent to perform the command and control functions through a digital service, such as a companion app. It usually happens when the user downloads or activates the app and the OEM activates their account. It triggers a configuration change on the vehicle to subscribe to the associated command topic in the MQTT broker.

  2. The companion app uses the command and control managed API to request execution of a remote command. The command execution might have more parameters to configure options such as timeout, and store and forward options. The workflow logic processes the API call.

  3. The workflow logic decides how to process the command based on the topic and other properties. It creates a state to track the status of the process. The command workflow logic checks against user consent information to determine if the message can be processed.

  4. The command workflow logic publishes a message to Event Grid with the command and the parameter values.

  5. Event Grid uses managed identities to authenticate the workflow logic. It then checks if the workflow logic is authorized to send messages to the provided topics.

  6. The messaging module in the vehicle is subscribed to the command topic and receives the notification. It routes the command to the right workload.

  7. The messaging module monitors the workload for completion or error. The workload is in charge of the physical execution of the command.

  8. The messaging module publishes command status reports to Event Grid. The vehicle uses an X.509 certificate to authenticate to Event Grid.

  9. The workflow logic is subscribed to command status updates and updates the internal state of command execution.

  10. After the command execution is complete, the service app receives the execution result over the command and control API.

The command and control workflow logic can fail if the vehicle loses connectivity. The Event Grid MQTT broker feature supports Last Will and Testament messages. If the device disconnects abruptly, the MQTT broker distributes a will message to all subscribers. The workflow logic registers to the will message to handle the disconnect, interrupt the processing, and notify the client with a suitable error code.

Vehicle and device provisioning

This dataflow describes the process to register and provision vehicles and devices to vehicle messaging services. The process is typically initiated as part of vehicle manufacturing. In the automotive industry, vehicle devices are commonly authenticated by using X.509 certificates. Event Grid requires a root or intermediate X.509 to authenticate client devices. For more information, see Client authentication.

Diagram of the provisioning dataflow.

  1. The factory system commissions the vehicle device to the desired construction state. It can include firmware and software initial installation and configuration. As part of this process, the factory system writes the device X.509 certificate, issued by a public key infrastructure certificate authority (CA), into storage designed specifically for that purpose, such as a Trusted Platform Module.

  2. The factory system registers the vehicle and device by using the Vehicle and Device Provisioning API.

  3. The factory system triggers the device provisioning client to connect to the device registration and provision the device. The device retrieves connection information to the MQTT broker.

  4. The device registration application creates the device identity with MQTT broker.

  5. The factory system triggers the device to establish a connection to the MQTT broker for the first time.

    1. The MQTT broker authenticates the device by using the CA Root Certificate and extracts the client information.
  6. The MQTT broker manages authorization for allowed topics using the local registry.

  7. For the part replacement, the OEM dealer system can trigger the registration of a new device.

Note

Factory systems are usually on-premises and have no direct connection to the cloud.

Data analytics

This dataflow covers analytics for vehicle data. You can use other data sources, such as factory information, fault data, repair reports, software logs, audio, or video, to enrich and provide context to vehicle data.

Diagram of the data analytics.

  1. The vehicle messaging services layer provides telemetry, events, commands, and configuration messages from the bidirectional communication to the vehicle.

  2. The IT and operations layer provides information about the software that runs on the vehicle and the associated cloud digital services.

  3. Data engineers use Notebooks and Kusto Query Language (KQL) query sets to analyze the data, create data products, and configure pipelines. Microsoft Copilot in Fabric supports the development process.

  4. Pipelines process messages into a more refined state. Pipelines enrich and deduplicate the messages, create key performance indicators, and prepare training data sets for Machine Learning.

  5. Engineers and business users visualize the data by using Power BI or real-time dashboards.

  6. Data engineers use reflex to analyze enriched vehicle data in near real time to create events such as predictive maintenance requests.

  7. Data engineers configure business integration of events and insights with Azure Logic Apps. The workflows update systems of record, such as Dynamics 365 and the Dataverse.

  8. Azure Machine Learning Studio consumes generated training data to create or update machine learning models.

Scalability

Deployment Stamps pattern

A connected vehicle and data solution can scale to millions of vehicles and thousands of services. Use the Deployment Stamps pattern to achieve scalability and elasticity.

Diagram of the scalability concept.

Each vehicle messaging scale unit is designed to support a specific vehicle population. Factors such as geographical region or model year can define this population. The application scale unit scales the services that require sending or receiving messages to the vehicles. The common service is accessible from any scale unit and provides vehicle and device management and subscription services for applications and devices.

  1. The application scale unit subscribes applications to messages of interest. The common service handles subscription to the vehicle messaging scale unit components.

  2. The vehicle uses the device management service to discover its assignment to a vehicle messaging scale unit.

  3. If necessary, the vehicle is provisioned by using the vehicle and device provisioning workflow into a vehicle messaging scale unit.

  4. The vehicle can now publish messages and subscribe to topics to the MQTT broker. Event Grid uses the subscription information to route the message.

The following previously used messaging examples illustrate the communication between the scale units:

(A) Basic telemetry without intermediate processing

  1. Messages that don't require processing and claims check are routed to an ingress hub on the corresponding application scale unit.

  2. Applications consume messages from their app ingress Event Hubs instance.

(B) Command and control

  1. Applications publish commands to the vehicle through an Event Hubs instance. These commands require processing, workflow control, and authorization by using the relevant workflow logic.

  2. Status messages that require processing are routed to the workflow logic.

  3. When the command is complete, the workflow logic forwards the notification to the corresponding event hub in the application scale unit for the application to consume.

  4. The application consumes events from the associated event hub.

Event Grid custom domain names

You can assign custom domain names to your Event Grid namespace’s MQTT and HTTP host names along with the default host names. Custom domain configurations eliminate the need to modify client devices that are already linked to your domain. They also help you meet your security and compliance requirements. To simplify device configuration and migration scenarios, use custom domain names.

Components

This example architecture includes the following Azure components.

Connectivity

  • Event Grid lets you easily build applications with event-based architectures. In this solution, Event Grid manages device onboarding, authentication, and authorization. It also supports publish-subscribe messaging using MQTT.

  • Event Hubs is a scalable event processing service designed to process and ingest massive amounts of telemetry data. In this solution, Event Hubs buffers messages and delivers them for further processing or storage.

  • Azure Functions is a serverless compute service that runs event-triggered code. In this solution, Functions processes vehicle messages. You can also use Functions to implement management APIs that require short-term operation.

  • Azure Kubernetes Service (AKS) deploys complex workloads and services as containerized applications. In this solution, AKS hosts command and control workflow logic and implements the management APIs.

  • Azure Cosmos DB is a globally distributed, multi-model database service. In this solution, it stores the vehicle, device, and user consent settings.

  • Azure API Management ensures secure and efficient handling of APIs. In this solution, API Management provides a managed API gateway to existing backend services such as vehicle lifecycle management, including over-the-air updates, and user consent management.

  • Azure Batch is a platform service that provides job scheduling and virtual machine management capabilities. In this solution, Batch runs applications in parallel at scale. It also efficiently handles large compute-intensive tasks, such as vehicle communication trace ingestion.

Data and analytics

  • Microsoft Fabric is a unified platform for data analytics that includes data movement, processing, ingestion, transformation, event routing, and report building. It provides data analytics for all collected vehicle and business operation data.

Backend integration

  • Logic Apps is a platform for creating and running automated workflows. In this solution, it runs workflows for business integration based on vehicle data.

  • Azure App Service is a fully managed platform for building, deploying, and scaling web apps. In this solution, it provides user-facing web apps and mobile back ends, such as the companion app.

  • Azure Cache for Redis provides high-performance data caching to accelerate applications. In this solution, it provides in-memory caching of data often used by user-facing applications such as the companion app.

  • Service Bus is a messaging service that ensures reliable communication, with enhanced security, between distributed applications and services. In this solution, it decouples vehicle connectivity from digital services and business integration.

  • Microsoft Dynamics 365 is a suite of intelligent business applications across sales, service, finance, and operations. In this solution, it provides a connected customer experience and seamless business processes, which ensures better dealership and OEM operations.

  • Microsoft Dataverse stores and manages business applications data with enhanced security. In this architecture, it stores information about the customer and vehicle.

Alternatives

Choosing the right compute for message processing and managed APIs depends on several factors. For more information, see Choose an Azure compute service.

We recommend that you use:

  • Functions for event-driven, short-lived processes such as telemetry ingestion.

  • Batch for high-performance computing tasks such as decoding large CAN trace and video files.

  • AKS for managed, fully fledged orchestration of containerized complex logic such as command and control workflow management.

As an alternative to event-based data sharing, you can use Azure Data Share if the objective is to perform batch synchronization at the data lake level.

For data analytics, you can use:

  • Azure Databricks to provide a set of tools to maintain enterprise-grade data solutions at scale. Databricks is required for long-running operations on large amounts of vehicle data.

  • Azure Data Explorer to provide exploration, curation, and analytics of time-series based vehicle telemetry data.

Scenario details

Diagram of the high level view.

Automotive OEMs are undergoing a significant transformation as they shift from producing fixed products to providing connected and software-defined vehicles (SDVs). Vehicles provide a range of features, such as over-the-air updates, remote diagnostics, and personalized user experiences. This transition enables OEMs to continuously improve their products based on real-time data and insights while also expanding their business models to include new services and revenue streams.

This example architecture describes how automotive manufacturers and mobility providers can:

  • Use feedback data as part of the digital engineering process to drive continuous product improvement, proactively address root causes of problems, and create new customer value.

  • Provide new digital products and services and digitalize operations with business integration with backend systems like ERP and CRM.

  • Share data with enhanced security and address country or region-specific requirements for user consent by using the broader smart mobility ecosystems.

  • Integrate with backend systems for vehicle lifecycle management and consent management to simplify and accelerate the deployment and management of connected vehicle solutions using an SDV DevOps toolchain.

  • Store and provide compute at scale for vehicle and analytics.

  • Manage vehicle connectivity to millions of devices in a cost-effective way.

Potential use cases

OEM Automotive use cases are about enhancing vehicle performance, safety, and user experience.

  • Continuous product improvement enhances vehicle performance by analyzing real-time data and applying updates remotely. For more information about how to develop software for the vehicle, see SDV DevOps toolchain.

  • Engineering test fleet validation ensures vehicle safety and reliability by collecting and analyzing data from test fleets. For more information, see Data analytics for automotive test fleets.

  • Companion app and user portal enables remote vehicle access and control through a personalized app and web portal.

  • Proactive repair and maintenance predicts and schedules vehicle maintenance based on data-driven insights.

Broader ecosystem use cases enhance connected vehicle applications. These improvements benefit fleet operations, insurance, marketing, and roadside assistance across the entire transportation landscape.

  • Connected commercial fleet operations optimize fleet management through real-time monitoring and data-driven decision making. For more information, see Automotive connected fleets.

  • Digital vehicle insurance customizes insurance premiums based on driving behavior and provides immediate accident reporting.

  • Location-based marketing delivers targeted marketing campaigns to drivers based on their location and preferences.

  • Road assistance uses vehicle location and diagnostic data to provide real-time support to drivers in need.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Azure Well-Architected Framework.

Reliability

Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the Reliability pillar.

  • Increase reliability with horizontal scaling. For more information about scaling your message processing pipeline, see Functions hosting options. For more information about scaling workflow execution logic and digital services, see Scaling options for applications in AKS.

  • Manage compute resources by dynamically scaling based on demand through autoscaling.

  • Use scale units to reduce the load on individual components and provide a bulkhead between vehicles. An outage on one stamp doesn't affect the others.

  • Use scale units to isolate geographical regions that have different regulations.

  • Replicate data across multiple geographic locations for fault tolerance and disaster recovery by using geo redundancy.

Vehicle connection reliability is critical for automotive messaging. For more information, see Reliability in Event Grid and Event Grid namespace.

Security

Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the Security pillar.

  • Use X.509 certificates to help ensure secure communication between vehicles and Azure. For more information, see Certificate management.

  • Establish a VSOC to detect threats, prevent cyber attacks, and comply with regulatory measures.

  • Collect and merge information from multiple data sources. Establish processes for risk mitigation, data forensics, incident response, and attack mitigation.

  • Create anomaly detection and early warning for networks, digital services, and electronic control units.

Cost Optimization

Cost Optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the Cost Optimization pillar.

  • Consider the cost per vehicle. The communication costs should vary based on the number of digital services provided. Calculate the return on investment for each digital service in relation to the operational costs.

  • Establish practices for cost analysis based on message traffic. Connected vehicle traffic can increase over time as more services are added. Examples include increased data collection for telematics insurance products, generative AI powered in-vehicle digital assistants, and car sharing applications.

  • Consider networking and mobile costs.

    • Use MQTT topic aliases to reduce the length of your topic names. This approach helps reduce traffic volume.

    • Use an efficient method, such as Protobuf or gzipped JSON, to encode and compress payload messages.

  • Manage traffic actively.

    • Vehicles tend to have recurring usage patterns that create daily and weekly demand peaks.

    • Prioritize messages by using MQTT user properties in your routing configuration. You can use this approach to defer the processing of noncritical or analytic messages to smooth the load and optimize resource usage.

    • Consider context-specific processing based on operational requirements. For example, send more brake telemetry only during severe braking conditions.

    • Adjust capacity based on demand.

  • Consider how long the data should be stored in hot, warm, or cold storage.

  • Optimize costs by using reserved instances.

Operational Excellence

Operational Excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Overview of the Operational Excellence pillar.

To enhance unified IT operations, consider monitoring the vehicle software. This software includes logs, metrics and traces, messaging services, data and analytics services, and related backend services.

Performance Efficiency

Performance Efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Overview of the Performance Efficiency pillar.

  • Consider using the scale unit concept for solutions that scale above 50,000 devices, especially if multiple geographical regions are required.

  • Consider the Azure subscription and service limits, quotas, and constraints when you design your scale units.

  • Consider the best way to ingest data, whether it's through messaging, streaming or batched methods. For example, handle high-priority messages like user requests immediately. Route analytics messages, such as vehicle performance data, directly to storage without processing. Design your system to minimize the number of high-priority messages that need immediate processing.

  • Consider the best way to analyze data based on the use case, either through batched or near real time processing. Near real time analysis provides immediate notifications to users, such as alerting them to an imminent vehicle problem. Batched analytics run periodically and provide nonurgent notifications, like predicting upcoming maintenance.

Deploy this scenario

The tutorial for the Connected Fleet reference architecture contains a sample implementation of the message processing pipeline.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Other contributors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps

The following articles describe interactions between components in the architecture:

The following articles cover some of the patterns used in the architecture: