Cloud-scale analytics data management landing zone overview
The data management landing zone is a management function and is central to cloud-scale analytics. It's responsible for the governance of your analytics platform.
Your data management landing zone is a separate subscription that has the same standard Azure landing zone services. It allows data governance of your data via crawlers, which connect to your data lakes and polyglot storage in your data landing zones. Virtual network peering connects your data management landing zone to your data landing zones and connectivity subscription.
Use this architecture as a starting point. Download the Visio file and modify it to fit your specific business and technical requirements when planning your data management landing zone implementation.
Note
Polyglot persistence is a storage term that describes your choice between different data storage/data stores technologies to support your various data types and their storage needs. Essentially, polyglot persistence is the concept that an application can use more than one core database or storage technology.
Important
Your data management landing zone must be deployed as a separate subscription under a management group with the appropriate governance. You can then control governance across your organization. The Azure landing zone accelerator illustrates how you should approach Azure landing zones.
Data catalog
Resource group: governance-rg
The data catalog registers and maintains the data information in a centralized place and makes it available for the organization. It ensures that enterprises avoid duplicate data products caused by redundant data ingestion by different project teams.
We recommend you create a data catalog service to define the metadata of the data products stored across the data landing zones.
Cloud-scale analytics depends on Microsoft Purview to register enterprise data sources, classify them, ensure data quality, and offer secure, self-service access.
Microsoft Purview is tenant based service and can communicate with each data landing zone by creating a Managed Virtual Network deployed to the region of your data landing zones. You can deploy Azure Managed Virtual Network Integration Runtimes (IR) within Microsoft Purview Managed Virtual Networks in any available Microsoft Purview region. From there, the managed virtual network IR can use private endpoints to securely connect to and scan the supported data sources. For more information, see Use Managed virtual network with your Microsoft Purview account. Creating a Managed virtual network IR within Managed Virtual Network ensures that data integration process is isolated and secure.
Note
Although this documentation focuses primarily on using Microsoft Purview for governance, enterprises might have invested in other products, such as Alation, Okera, or Collibra. These solutions are subscription based and we would recommend deploying thsese to the data management landing zone. Be aware that some custom integration might be required.
For more information, see Data catalog and Microsoft Purview deployment best practices for cloud-scale analytics.
Data quality management
Resource group: governance-rg2
Continue with your current solution.
You should manage data quality as close to your data source as possible so you avoid quality issues replicating across your analytics and AI estate. Moving quality metrics and validation to your data integration aligns the quality process with the teams that are closest to your data. These teams have the deepest understanding of your data asset.
Data lineage also provides data quality confidence, and you should provide it for all data products and products.
For more information on data quality management, see Data quality.
Data modeling repository
Resource group: governance-rg2
You should capture and store entity relationship models in a central location within your data management landing zone. This provides data consumers a single place to find conceptual diagrams.
Many customers use ER Studio and iServer to model their data products before ingestion.
Master data management
Resource group: governance-rg2
Master data management control resides in the data management landing zone. Master data management in data mesh contains specific considerations you should call out for data mesh.
Many master data management solutions fully integrate with Microsoft Entra ID. This integration allows you to secure your data and provide different views for different user groups.
For more information, see Master data management system.
API catalog
Resource group: governance-rg2
Your data application teams across will likely create various APIs for their data application. These APIs can be difficult to discover across your organization. Placing an API catalog in your data management landing zone can solve this problem.
An API catalog can help standardize your documentation and offers a place for internal collaboration on APIs. It also can drive consumption, publishing, and governance controls across your organization.
Data sharing and contracts
Resource group: governance-rg2
Cloud-scale analytics uses Microsoft Entra entitlement management or Microsoft Purview policies to control access to data sharing. Even so, you might still require a sharing and contract repository. This repository is an organizational function and should reside in your data management landing zone.
Your contracts should provide information on data validation, models, and security policies.
For more information, see Data contracts
Azure Container Registry
Resource group: containers-rg
Your data management landing zone hosts an Azure Container Registry. The Azure Container Registry allows your data platform operations to deploy standard containers for use in data science projects that your data application teams consume.
Azure Synapse Private Link hubs
Resource group: synapse-link-rg
Azure Synapse Analytics Private Link hubs are Azure resources that connect your secured network and the Azure Synapse Studio web experience. Cloud-scale analytics securely connects your Azure Virtual Network to Azure Synapse Studio using private links from these hubs.
There are two steps to connect to Azure Synapse studio using private links.
- Create a Private Link hub resource.
- Create a private endpoint from your Azure Virtual Network to that Private Link hub.
You can then use private endpoints to securely communicate with Azure Synapse studio. Integrate these private endpoints with your DNS solution, either with your on-premises solution or with Azure Private DNS.
For more information, see Connect to Azure Synapse studio using private links.
Automation interfaces (optional)
Your organization might decide to create many automation services to augment cloud-scale analytics capabilities. These automation services drive conformity and onboarding solutions for your analytics state.
If you decide to build these automation services, you should have a user interface that acts as both a data marketplace and an operation console. This interface should rely on an underlying metadata store like we've previously discussed in Metadata standards.
Your data marketplace or operations console calls a middle tier of microservices to facilitate onboarding, metadata registration, security provisioning, data lifecycle, and observability.
You can provision the automationdb-rg resource group to host your metadata store.
Important
None of these automation services are products, and they do not illustrate any roadmap item. They are listed to help you consider which items you might want to automate.
Services
Service | Service Scope |
---|---|
Data landing zone provisioning | This service creates a new data landing zone. It's unlikely to have a high usage, but is included for end-to-end onboarding solution completeness. For more information, see Provision the cloud-scale analytics |
Data product onboarding | This service creates and amends resource groups that pertain to an onboarded tenant. It also contains capabilities to upgrade and degrade SKUs and to activate and deactivate resource groups for any onboarded tenant or service. It creates a new data landing zone DevOps. For more information, see Provision the cloud-scale analytics |
Access provisioning | This service creates access packages, access policies, and asset access approval process (manual or automatic) using SPN/UPN. It can also expose an API to provide a list of subscription requests (assets) that users have submitted in the past 90 days. For more information, see Data access management |
Data agnostic ingestion | This microservice creates new data sources for ingestion into your data landing zones. It does this by communicating with an Azure Data Factory SQL Database metastore in each data landing zone. For more information, see How automated ingestion frameworks support cloud-scale analytics in Azure |
Metadata | This service exposed and creates metadata for the platform. For more information, see Metadata standards |
Data lifecycle | This service is responsible for maintaining your data lifecycle based on metadata. This maintenance can include moving data to cold storage and deleting records that no longer need to be retained. For more information, see Data lifecycle management |
Data domain onboarding | ONLY APPLICABLE TO DATA MESH. This service captures metadata pertaining to new domains and onboards the new domains as needed. It can also create, update, activate, and deactivate any domain or service line you might build into a microservice. For more information, see Provision the cloud-scale analytics |
Data standardization
Although it isn't a specific feature or product of your data management landing zone, you should call out data standardization across all services. Data standardization defines the format in which your data should land and be stored.
Tip
Use delta-lake format wherever possible as the defacto standard across all services and storage.
For more information, see Data standardization.