แก้ไข

แชร์ผ่าน


Data catalog

A data catalog registers and maintains data information in a centralized place so that it's available for your organization. It minimizes the chance of different project teams ingesting redundant data, which prevents duplicate data products. We recommend that you create a data catalog service to define the metadata of data products that you store across data landing zones.

Cloud-scale analytics relies on Microsoft Purview to register enterprise data sources, classify them, ensure data quality, and provide highly secure, self-service access.

Microsoft Purview is a tenant-based service that can communicate with each data landing zone. It creates a managed virtual network and deploys it to your data landing zone region. You can deploy Azure managed virtual network integration runtimes (IR) within these managed virtual networks in any available Microsoft Purview region. The managed virtual network IR can then use private endpoints to securely connect to and scan the supported data sources. This approach helps isolate and secure the data integration process. For more information, see Use managed virtual networks with your Microsoft Purview account.

If you use Azure Databricks, we recommend using Azure Databricks Unity Catalog in addition to Microsoft Purview. Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces. For more information, see Unity Catalog best practices.

Note

This article focuses on using Microsoft Purview for governance, but your enterprise might have investments in other products, such as Alation, Okera, or Collibra. These solutions are subscription-based. We recommend that you deploy them to the data management landing zone. They might require custom integration.

Data discovery

Data discovery reflects the state of all the data that the enterprise owns. This data is known as the data estate. During data discovery, the data estate is scanned and classified. The data scanning process connects directly to the data source according to a set schedule.

As you add a new data landing zone to the environment, the associated data lakes and polyglot persistence sources must be registered as sources for the data catalog crawlers to scan.

With automated discovery of your data estate to populate the catalog, you can:

  • Crawl metadata from Azure and on-premises data sources
  • Scan your data lakes, blobs, and other supported targets
  • Extract schema from your data targets for XML, TSV, CSV, PSV, SSV, JSON, Parquet, Avro, and ORC file types
  • Allow automated catalog updates through configurable scheduling of scans and scan rule sets

Important

When you add a new data landing zone to the environment, register the associated data lakes and polyglot storage through Azure DevOps as a source for the data catalog crawlers to scan, govern, and manage data integrity.

Data classification

Microsoft Purview allows you to apply system or custom data classifications on file, table, or column assets.

Data classifications are like subject tags. Microsoft Purview marks and identifies the content of specific data types found within your data estate during scanning. You use sensitivity labels to identify the categories of classification types within your organizational data. You can also use sensitivity labels to group the policies you wish to apply to each category. Microsoft Purview makes use of the same sensitive information types as Microsoft 365, allowing you to extend your existing security policies and protections across your entire content and data estate.

Microsoft Purview can scan and automatically classify documents. For example, if you have a file named multiple.docx and it has a national ID number in its content, Microsoft Purview adds a classification such as EU National Identification Number in the asset detail page.

Microsoft Defender for SQL is a feature available for Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. It includes functionality for discovering and classifying sensitive data, surfacing and mitigating potential database vulnerabilities, and detecting anomalous activities that could indicate a threat to your database. Microsoft Defender for SQL provides a single goto location for enabling and managing these capabilities.

Next steps