Sample setup for data governance
Microsoft Purview data governance, featuring Microsoft Purview Unified Catalog and Microsoft Purview Data Map, delivers comprehensive visibility, data confidence, and responsible innovation to help organizations achieve greater business value in the era of AI. Using an example of managing health data, follow the steps in this article to help you understand how to set up Unified Catalog and use its functionality to build a sound data governance practice for your organization.
Step 1: Set up your governance domains in Unified Catalog
Governance domains are the key to establishing accountability for your data and will help to federate governance of that data across the company. When you create governance domains, starting with the proper owner ensures you're able to effectively identify and collaborate with experts for all of the data in the data estate. Governance domains can be many different types to align to the type of data boundary for the team that will govern that data. For example: functional domains (finance, HR, sales), or data domains (product, customer, health).
Prerequisites
Grant permissions and build the first governance domain
Open the Microsoft Purview portal.
Sign in to the Microsoft Purview portal using credential for an admin account that is assigned the Role management role (for example, a Purview administrator). Go to Settings > Roles and scopes to view and manage.
Select Role groups.
On the Role groups for Microsoft Purview solutions page, select the Data Governance role group.
On the Edit member of the role group page, select Choose users or Choose groups.
Select the check box for all users or groups you want to add to the role group.
Select Select.
In Unified Catalog, select Catalog management, then select Governance domains.
On the Governance domains page, you can set up the rest of your catalog to enable others to federate the ownership of data, empower teams to build out their knowledge, and establish business value of your data.
- Start by selecting New governance domain.
- You're able to update the name of your governance domain but for this we can name it '(Tutorial) Personal Health' and give it a description of 'Personal health data refers to any information related to an individual’s physical or mental health that is collected and used within the healthcare sector. This can include a wide range of data types, such as medical records, treatment histories, diagnostic images, and laboratory test results. It's often protected under various laws and regulations to ensure privacy and confidentiality.'
- Select the type as a 'data domain'.
- Leave the parent blank (if this is the first governance domain in the catalog it will not have anything to select here)
- Select Create
- Now create two more domains on your own. These will be key points of federation for collaboration and governance in your own organization, so think about who might be the owners of your domains when you implement Microsoft Purview Unified Catalog.
- You can follow these examples:
- A Corporate functional domain represents the highly controlled assets and terms that an entire company uses.
- Sales is a functional domain that most organizations will have that is a child domain of Corporate.
- Start by selecting New governance domain.
Select the governance domain created.
Select the Roles tab of the governance domain.
By default, when you create the governance domain you're added to all roles in the governance domain. As a governance domain owner you add the data stewards (business experts in your domain), and the data product owners(who know which data assets are the best for others to consume).
Switch back to the Details tab.
Select the Manage policies button to apply a domain level policy. This policy will be applied to all data products in the domain, enabling the automatic application of a policy ensures the data experts don't have to be policy experts as well.
In the Manage access policies tab select the checkbox next to Permit data copies. By selecting this policy option it will automatically apply an attestation requiring all users who request access to your data products to attest that they understand the data copy policy for your data.
Select Save changes to confirm policy is set by the governance domain.
Select Publish on the governance domain. The Publish button publishes all other concepts within the domain.
Create glossary terms
Adding glossary terms to your governance domain enables others to better understand how the business uses and understands the data. Glossary terms also ensure insights use common terms, and generally your knowledge across your governance domain.
On the page for your governance domain, find the Glossary terms card and select View all.
On the Glossary terms page, select New term.
Enter details:
- Name: 'Outbreak'
- Description: A disease that has affected or has the potential to affect a large portion of the population.
- You can leave the rest blank for now but there are fields to collect: the term owner responsible for defining the term for your company, acronyms to share common also known as names of the term, lastly you can provide links to the resources that would have even more information about the term.
Select Create
Select the Manage Policies button. Similar to the domain level policies, you can create term level policies that will be applied wherever the term in in use.
Check the box next to Manager approval required. This enforces a secondary approval from the users listed manager in Microsoft Entra ID when access is requested to the data products.
Select Publish for the Outbreak term created. Published terms will be filterable in Unified Catalog and ensure others that use the term to describe their data product will be able to see that description in Unified Catalog while browsing the data product.
Now create two more terms. This time, select the 'Outbreak' term as the parent term for the terms you created. Try building relationships between these child terms in the related tab on either term to help build out the network of how these terms work together to explain the entirety of a topic.
- Pandemic: A global outbreak of a disease that affects a large number of people across multiple countries or continents.
- Epidemic: A country wide or regional outbreak of a disease that is high contagious and affects a large portion of the population.
Try creating a couple of other terms in any other domains you created earlier. If you aren't sure what to add, try the Get suggested terms button to have GenAI propose a few based on the description and name of the domain you already provided.
Add an OKR
Now add an OKR (objective and key result) for your Personal Health domain to help others understand the business value of your data. This will build a direct connection between your data and the business value it provides.
Select the OKR box from the governance domain page.
Select New OKR.
Enter the details of the objective first:
- Objective: Reduce pandemic risk by enabling effective patient vaccine uptake.
- Owner: Enter your name
- Target date: '2024-12-31'
Select Create
Adding key results to your objective ensures the goals are measurable and that progress towards the goal is monitored. Select + Add key result.
Enter the Key result details:
- Key result: Ensure 80% older age groups(>65 years) that are most likely to be affected by the pandemic receive full vaccination by end of the calendar year 2024.
- Progress status: On track
- Progress Amount: 70
- Goal amount: 80
- Maximum amount: 100
Select Create.
Select Publish.
Create critical data elements
Lastly, create a critical data element (CDE) in Personal Health to ensure the most important columns of data have a consistent definition, understanding, and that they always meet business expectations for how that data is formed and stored.
- From the governance domains page with the Personal Health domain selected, select the Critical data elements box.
- Select New critical data element.
- Enter the basic CDE metadata:
- Name: Age groups
- Description: Common grouping of person ages used to ensure needed analytical reports follow a reference that others can depend on and removing individual ages to improve anonymity of the data. The age group is divided into 8 groups; <2 years, 2-4 years, 5-11 years, 12-17 years, 18-24 years, 25-49 years, 50-64 years, 65+ years.
- Owner: enter your name
- Expected Data Type: Text
- Select Create
The real power of the CDE is that it maps directly to the physical data columns where this data is stored. This connection ensures common understanding and enables the evaluation of Data Quality rules and policies at scale.
From the CDE you just created select + Add column.
Search for the Covid 19 Vaccine and Case Trends data asset from the gold container of the data lake
Select the box, not the name, of the Covid 19 Vaccine and Case Trends asset.
Tip
If you select the blue name of the asset it will open a new window in Microsoft Purview showing you the asset details.
Select the radio button next to the AgeGroupVacc column.
Select Add.
Select the Data quality tab at the top of the CDE you just created to apply data quality rules to the CDE. It's similar to how you added policies for glossary terms and governance domains.
Select New rule
Select Data type match
Enter Rule name: Confirm Age group formatting
Select Create
Select Publish on the CDE
This CDE will now automatically apply a data quality rule to every data product that uses the Covid 19 Vaccine and Case Trends asset, which we'll walk through in the next section.
- Try creating a couple of other CDEs in your other domains. Here's some ideas:
- Sales: Revenue and Seller Name
- Corporate: Product ID
Step 2: Set up and register your data in Data Map
If you don't have data sources available for scanning, then you can follow along with these steps to fully deploy an Azure Data Lake Storage (ADLS Gen2) example.
Tip
If you already have a data source in the same tenant as your Microsoft Purview account, move ahead the next part of this section to scan your assets.
In a real data estate you find many different systems in use for different data applications. There are reporting environments like Fabric and Snowflake where teams use copies of data to build analytical solutions and power their reports and dashboards. There are operational data systems that power the applications teams or customers use to complete business processes that collect or add data based on decisions made during the process.
To create a more realistic data estate, the recommendation is to show many sources of data in the catalog, which can cover the breadth of different data uses any company might have. The types of data required to power a use case can be vastly different with business users that need reports and dashboards, analysts need conformed dimensions and facts to build reports, data scientists or data engineers need raw source data that came directly from the system that collects the data all of these and more enable different users to see the importance of finding, understanding, and accessing data in the same place.
For some other tutorials to add data to your estate, you can follow these guides:
- Fabric Lakehouse Tutorial – provides the base of a reporting environment
- Azure SQL Database (Sample) – provides a well structured example of an operational data store
Prerequisites
- Subscription in Azure: Create Your Azure Free Account Today
- Microsoft Entra ID for your tenant: Microsoft Entra ID Governance
- A Microsoft Purview Account
- Admin access to the Microsoft Purview account (This is the default if you created the Microsoft Purview account. Permissions in new Microsoft Purview portal preview | Microsoft Learn)
- All resources; Microsoft Purview, your data source, and Microsoft Entra ID have to be in the same cloud tenant.
Set up your data estate
A. Create and populate a storage account
- Follow along with this guide to create a storage account: Create a storage account for Azure Data Lake Storage Gen2
- Create containers for your new data lake:
- Navigate to the Overview page of our Storage Account.
- Select the Containers tab under the Data storage section.
- Select the + Container button
- Name as 'bronze' and select the Create button
- Repeat these steps to create a 'gold' container
- Download some example CSV data from data.gov: Covid-19 Vaccination And Case Trends by Age Group, United States
- Upload the CSV to the container named 'bronze' in the storage account you created.
- Select the container named 'bronze' and select the Upload button.
- Browse the location where you saved the CSV and select the Covid-19_Vaccination_Case _Trends file.
- Select Upload.
B. Create an Azure Data Factory
This step will demonstrate how data moves between layers of a medallion data lake and ensure the data is in a standardized format that consumers would expect to use, this is a prerequisite step for running Data Quality.
Follow this guide to create an Azure Data Factory: Create an Azure Data Factory
Copy the data from the CSV in the 'bronze' container to the 'gold' container as a Delta format table using this Azure Data Factory guide: Transform data using a mapping data flow
Open the Azure Data Factory (ADF) experience from the Azure portal by selecting the Launch studio button on the Overview tab of the ADF resource created.
Select the Author tab in ADF studio.
Select the + button and pick Data flow from the drop-down menu.
Name the dataflow 'CSVtoDeltaC19VaxTrends'.
Select Add Source in the empty box.
Set Source settings to:
- Output stream name: 'C19csv'
- Description: leave blank
- Source type: Inline
- Inline dataset type: Delimited Text
- Linked Service: Select the data lake where you stored the csv
Set Source options to:
- File mode: File
- File path: /bronze/ Covid-19_Vaccination_Case _Trends
- Allow no files found: leave unchecked
- Change data capture: leave unchecked
- Compression type: None
- Encoding: Default(UTF-8)
- Column delimiter: Comma (,)
- Row delimiter: Default(\r, \n, or\r\n)
- Quote character: Double quote (“)
- Escape character: Backslash ()
- First row as header: CHECKED
- Leave the rest as defaults
Select the small + Next to the source created and select Sink
Create the sink where the format and location of the data to be stored to move the data from a csv in 'bronze' to a delta table in 'gold'.
- Set the Sink values (leave all settings as default unless specified)
- Sink type: Inline
- Inline dataset type: Delta
- Linked service: the same data lake as used in the source, because we'll be storing in a different container.
Set the Setting values (leave all settings as default unless specified)
- Folder path: gold/Covid19 Vaccine and Case Trends
You need to enter the value because this name is how we want the data to be stored and doesn't exist to select.
Select Validate, this checks your data flow and provide instructions to fix any errors.
Select Publish all.
Select the + button and select pipeline from the drop-down menu
Name your pipeline 'CSV to Delta C19 Vax Trends'
Select the dataflow created in the previous steps CSV to Delta (C19VaxTrends) and drag and drop it on the open pipeline tab.
Select Validate
Select Publish
Select Debug (use activity runtime) to run the pipeline.
Tip
If you hit errors for spaces or inappropriate characters for delta format: open the downloaded CSV and make corrections. Then re-upload and overwrite the CSV in the bronze zone. Then rerun your pipeline.
Navigate to your gold container in the data lake and you should now see the new Delta table created during the pipeline.
Scan your assets
If you haven't scanned data assets into your Microsoft Purview Data Map, then you can follow these steps to populate your data map.
Scanning sources in your data estate will automatically collect the metadata of the data assets (tables, files, folders, reports, etc.) in those sources. By registering a data source and creating the scan, you establish the technical ownership over the sources and assets that are displayed in the catalog and ensure that you have control over who can access which metadata in Microsoft Purview. By registering and storing sources and assets at the domain level, it will be stored at the highest level of access hierarchy. Typically it's best to create some collections where you'll scan the asset metadata and establish the correct access hierarchy for that data.
- Provide reader access for Microsoft Purview Managed Identity (MSI) to your data lake or other data store.
Tip
The MSI is the account name of the Microsoft Purview instance.
If you've chosen to use Microsoft Fabric or SQL, you can use these guides to provide access:
Register your data lake and scan your assets
In Microsoft Purview Data Map under domains tab, select the Role assignments for the domain (it will be the name of Microsoft Purview account):
- Add yourself as the data source admin and the data curator to the domain.
- Select the person icon next to the role Data source admin.
- Search your name as it is in Microsoft Entra ID (it could require you to enter your full name spelled exactly as it is in Microsoft Entra ID).
- Select OK.
- Repeat these steps for data curator.
- Add yourself as the data source admin and the data curator to the domain.
Register the data lake:
- Select the Data sources tab.
- Select Register.
- Select the Azure Data Lake Storage Gen2 storage type.
Provide the details to connect:
- Subscription (optional)
- Data Source Name (this will be the name of the ADLS Gen2 source)
- Collection where asset metadata should be stored (optional)
- Select Register
Once registration of the data source is complete, you can configure the scan. Registration signifies that Microsoft Purview is connected to the data source and has placed it in the correct collection for ownership. Scanning will then read the metadata from the source and populate the assets in the data map.
Select the source you registered in data sources tab
Select new scan and provide details:
- Use the default integration runtime for this scan
- Credential should be Microsoft Purview MSI (system)
- Scan level is Auto Detect
- Select a collection or use the domain (collection must be the same collection or child collection of where the data source was registered)
- Select Continue
Tip
At this point Microsoft Purview will test the connection to validate a scan can be done. If you have not granted the Microsoft Purview MSI reader access on the data source it will fail. If you are not the data source owner or have user access contributor the scan will fail since it expects you have authorization to create the connection.
Now only select the container 'gold' where we placed the delta table in the building data section of the tutorial. This will prevent scanning any other data assets that are in your data store.
- Should have only one blue check next to gold, you can leave checks next to everything as it will scan the full source and still create the assets we'll use and more.
- Select Continue
In the select a scan rule set screen you should use the default scan rule set.
Select Continue
In set a scan trigger you'll set the frequency of the scanning so as you continue to add data assets to the gold container of the lake it will continue to populate the data map. Select Once.
Select Continue.
Select Save and Run. This will create a scan that will only read the metadata from the gold container of your data lake and populate the table we'll use in Microsoft Purview Unified Catalog in the next sections. If you only select save, it will not run the scan, and you won't see the assets. Once the scan is running, you'll see the scan you created with a Last run status of Queued. When the scan reads complete your assets are ready for the next section. This could take a few minutes or hours depending on how many assets you have in your source.
Step 3: Publish your data products
Creating data products is essential to ensure that the right data is made discoverable by your organization. Data products will help to prevent over governing data that is low or no value in your data estate because it has no use or limited value. Ensuring your data experts are about to publish data products will activate your most valuable data and build the right level of governance based on that value. Curating assets that technical teams don't know the business purpose of or trying to govern everything in your complex and growing data estate will cause extra time and lost productivity chasing down the details of data that might never be used, or could just be removed from the estate. Instead focus on the pieces of data that have value and that people need to discover and build even more value. As teams use more data and gain a better understanding of what is needed or more useful data products can be created to meet those demands and governance can adapt to ensure it always stays the right size based on the value and sensitivity of the data.
Prerequisites
- Must be a data product owner for the governance domain you're using.
- Must have data assets in the Data Map. If you don't, see section 2 of this tutorial to add some.
- A governance domain must be published to publish a data product. If you don't have one, see section 1 of this tutorial to create one.
Create and publish a data product
Open the Microsoft Purview portal.
Select Unified Catalog.
Select Catalog management and then Governance domains.
From the Governance domains page, select the Personal Health domain
Select the Go to data products link under Business concepts
Here's where the data experts called data product owners will identify the data assets that are intended to be consumed by others in your organization, and provide the necessary information to make them usable.
Select New data product
Provide details about the data product
- Name: 'Covid-19 Vaccination and Case Trending by Age'
- Description: 'This data comes from the CDC as a part of the U.S. Department of Health & Human Services. The data contains trends in vaccinations and cases by age group, at the US national level. Data is stratified by at least one dose and fully vaccinated. Data also represents all vaccine partners including jurisdictional partner clinics, retail pharmacies, long-term care facilities, dialysis centers, Federal Emergency Management Agency and Health Resources and Services Administration partner sites, and federal entity facilities.'
- Type: Dataset
- Select Next
- Use cases: “This data is provided for public use and is intended to help understand the trends of vaccination up take and new cases by different age groups. The ages are banded into two groups ranging from <2 years to 65+ years. Similarly the trends are provided in daily numbers that provide seven day average of new cases by age group.”
- Mark as Endorsed as checked.
- Select Save.
Now you have the base metadata of the data product built out. Next add some properties and map the asset from the data map.
Select the + Add data assets button.
You'll see the assets you have scanned into the data map, this will include all folders and layers of the data source.
Search for the Covid19 Vaccine and Case Trends asset you added to the gold container of your data lake and select this resource set.
Select Add. You can select as many assets as needed for a data product but here only one is needed.
Tip
Try the Get suggestions button to have GenAI help pick from the assets in your data map and select the Covid19 Vaccine and Case Trends from a reduced list of results.
You can now see the asset added to your data product.
Select + Add term next to the glossary terms title
Select the Outbreak term created earlier and select Add
You should see the critical data element for age group from the asset mapped to the data product now.
Select + Add OKR next to the OKR title
Select the Reduce pandemic risk by enabling effective patient vaccine uptake. It's the objective we created in the first section.
Manage data product access request policies
At the top of the page, the last step before publishing the data product is to select the Manage policies button. Here the access policies and request access workflow are configured by making selections and providing the names for approval. You can also use the Inherited policies tab to see the governance domain policy applied for data copies attestation we applied earlier. It's the same for the Manager approval required coming from the Outbreak glossary term.
Select the Manage policies tab.
Under Access time limit, provide details for how long the request for access is good before needing to be renewed. We'll set this to grant access for up to one year.
In the box, put 1.
Select years in the drop-down.
Under approval requirements, provide your name in the approvers box. (It will require the name registered in Microsoft Entra ID)
Note
It's not necessary to check manager approval because that policy is inherited from the outbreak glossary term.
Select the Preview request form button to see what the catalog consumers will view when requesting access. You'll see the data copy attestation and manager approval required because they were set by the governance domain and glossary term.
Select Save changes.
Once you have the data assets mapped and the access policies configured, you're ready to publish your data product to the catalog.
Select Publish on the data product.
Try creating a Profit Report in other domains you created earlier
- Profit Report, Type: Dashboards/reports.
- Product Master, Type: Maser data and reference data.
Note
You can add many assets to these and see how a data product with many assets will look and may the data products to the terms from any domain to see how the glossary is used to describe the data using a consistent set of terms.
Step 4: Run data quality
Now that you have a data product available in the catalog, running data quality rules will tell everyone that the data is in good shape and ready to be used. As more is learned about the data new data quality rules can be added to make sure it's fit for all use cases. Ensuring data products are of the highest quality will help to build trust in your data and tell others that it's being monitored an improved. As the value of data increases, the quality of that data will have to be more closely monitored and controlled as data quality issues can cause massive impacts if poorly managed.
Prerequisites
- Data quality rules can only be run on delta format tables in ADLS Gen2 and Microsoft Fabric.
- The Managed Identity from Microsoft Purview must be enabled to read the data source as it is the only supported credential for data quality today.
- You must have the data quality steward role in the governance domain you're running data quality in.
- You must be the owner or have user access administrator access to the data source you're connecting data quality scanning to ensure proper security authorization to scan the data.
- You must have the data profile steward role to run profiles on your data.
Create and run data quality rules
Open the Microsoft Purview portal.
Select Unified Catalog.
Select the Data quality tab under Data management.
Select the Personal Health Domain created in section 1.
Select the Manage button and pick Connections from the menu. Building this connection will ensure that you're able to run data quality scans on your data source in that governance domain, preventing teams from gaining access to knowledge of the data without proper authorization.
Select New on the connections screen to create a new connection:
- Provide the display name 'Personal Health ADLSg2 DQ'.
- Select source type of Azure Data Lake Storage Gen2.
- Provide details of the data source created in section 2.
Note
Credential must be Microsoft Purview MSI (system) for a data quality connection
- Select Test connection
- Once the connection is tested, select Submit
- Provide details of the data source created in section 2.
Once the connection is established, you're ready to run profiles and start building data quality rules. This ensures that the experts that know the business rules and appropriate rules are running on the most important data products.
- Go back to the Data quality page.
- Select the Personal Health governance domain.
- Select the Covid-19 Vaccination and Case Trending by Age data product built in section 3.
- Select the asset that was added to the data product. (It must be in delta format from section 2 or data quality won't run).
- Apply data quality rules to the columns of the data to measure if it's meeting your expectation of quality:
- Select Rules tab on the asset selected.
- Select New rule.
- Select Empty/blank fields rule.
- Provide details:
- Select AgeGroupVacc column from the column drop-down
- Rule Name: Confirm Vaccination Age Group Exists
- Select Create.
- Select New rule.
- Select Data type match.
- Provide details.
- Select DateAdministered column.
- Select Create.
- Select Run Data quality scan.
Profile data
Create a profile for your data to see the high level statics of each column and discover any anomalies that could have a new rule.
- In Unified Catalog, select Health management, then select Data quality.
- Select Profile data
- Check the top box next to Column name to profile all columns. Microsoft Purview will recommend which columns to profile, and you can select columns that you know are worth profiling to help prevent profiles on highly sensitive data or data you know will be sparsely populated.
- Select Run profile
When the scan is complete, you'll be able to review the data quality score and profile for your new data product and the data quality score will be available to all users of the catalog ensuring that all know the status of the data.
Create a schedule for your data quality scans to ensure you're continuously monitoring for data quality issues. Set alerts to make sure you're addressing data quality issues before consumers are affected.
- Under Health management, select Data quality.
- Select the Personal Health domain where we configured the data quality rules.
- From the Manage dropdown list, select Scheduled scans.
- On the Scheduled scans page, select New.
- Add Overview details
- Name: Personal Health DQ Monthly Evaluation
- Description: Monthly scan of DQ rules for continuous improvement.
- Select Continue
- Select the scope of the scan
- Check the box next to Covid-19 Vaccination and Case Trending by Age data product
- Select Continue
- Schedule the scan to ensure it runs on the last day of every month
- Select Recurring
- Recurrence: Every one Month
- Month days: Last
- Schedule scan time (UTC): 12:00:00
- Start recurrence at (UTC): leave as default
- Select Continue
- Review details of the scan to see if there are any changes you would like to make before saving.
- Select Save. Because we triggered a manual scan earlier we don't need to trigger another scan now but if a new scan is needed, select Save and run.
Configure alerts
Once data quality has scheduled scans, there are alerts that can be triggered to let stewards know if there are issues or attention needed because of data quality issues or scan failures. Configure a data quality alert for failed scans and when the score decreases by more than 5%.
- Go back to the Personal Health domain on the Data quality page.
- From the Manage dropdown list, select Alerts.
- Select New.
- Enter alert details
- Display Name: Personal Health DQ Monthly Scan
- Description: To ensure minimum DQ thresholds are meeting consumer expectations.
- Target: Score decreases by more than
- Threshold: 5
- Turn off notifications: leave unchecked
- Turn on notification for failed quality scans: leave checked
- Recipient: enter your name
- Select Continue.
Tip
When implementing in your Unified Catalog you will want to send the alerts to the stewards that can notify consumers of the issue and work with the technical owner of the data to make corrections.
At the end of this section, you'll now have a functioning Unified Catalog with operational data quality to manage the data you're offering to organizational data consumers. Everything has been to get the most valuable data to the consumer and build trust in the data that they would be using. As value of the data grows and new data strategies emerge the next section will help to show how you can manage the entire catalog or go deeper into specific data management with Master Data.
Step 5: Master data management
Master data management is the practice of conforming the most important data entities that must be accurate, unique, and consistently applied in all areas of the business because errors and issues in this data can impact the whole business. Through one of our MDM partners, you'll be able to integrate your choice of Master Data Management (MDM) solution with Microsoft Purview to enable data unification, standardization, and cleansing that will enable golden record creation and the publication of master data as data products.
Follow the tutorials here for your solution of choice: Master data management in Microsoft Purview
Step 6: Manage data health
In Microsoft Purview Data Estate Health, the Central Data Office and other data managers are able to evaluate the status of the data against their company standards and effectively manage progress towards their strategy. In order to make sure that everyone in the company knows what can be done to increase the value of their data it's essential that the standards are understood and scalable to the whole organization without needing to make everyone a data governance expert. Starting from an industry standard set of controls that are available out of the box in Microsoft Purview each data office and customize the controls to meet their expectations and ensure it aligns with their data goals. Critical to the effectiveness of these controls isn't only the measurement of these standards but also ensuring those responsible for the data are able to take action on their own and be held accountable to making the improvements that affect the value of data. In Data Estate Health, you're able to set and manage all of these critical capabilities.
Prerequisites
- Data products, glossary terms, and other business concepts published in Microsoft Purview Unified Catalog. You can follow the previous sections to create these:
- At least 24 hours since the curation of data products.
- You must have the Data Health Owner role in Unified Catalog.
Evaluate your data governance with data estate health
Open the Microsoft Purview portal.
Select Unified Catalog.
Under Data Estate Health in the left navigation, select Health controls.
Select the carrot > next to the Value Creation control group.
While hovering over a control title, select the pencil icon to edit the control. By editing the control, you change the threshold of the control to set expectations for what the score should be and set the color scoring to demonstrate the progress stages.
The details enable you to provide a description of the control and what it means to your organization and set an owner for a specific control.
Select the Rules tab of the control to change the threshold. This demonstrates that it has a high target and if it isn't healthy it's critical to follow up on.
- Inherit from group: toggle to switch off (should turn grey).
- Target score: 90
- Select New rule.
- Set the box next to the score to GreaterThanOrEqual
- Set the percentage to 90
- Status = Health (green)
- Else Box Status = Critical (Purple)
- Select Save.
Under data estate health, select Metadata quality.
Here you can change or add rules that create the scores of the control. Here we want to change the severity of the actions for Value Creation to ensure all users know the importance of this action.
- Select Configure severity
- Select the Value Creation control group
- Select the Business OKRs alignment control title
- Change the Severity from Medium to High and select Save
- Select the Health actions tab
- Filter Assigned to: to your name
- Select an action where you can see what the owner of the action needs to do to ensure governance expectations are met or they can assign a new owner to get the best expert to provide their input. There's also a status that lets others know what work is on going and where other actions could need prioritization.
Step 7: Data democratization
Enabling users to find and access the data they need in a complaint manner is the essence of data democratization, and ensures people can find the data they need to build business value. Providing a clean and easy experience to discover data is the purpose of Microsoft Purview Unified Catalog, all while empowering stewards to update and manage the data made available in the catalog at scale. In this section, we walk through how users can find and request access to data and ensure that the appropriate approvers are able to track and provide inputs on those access requests.
Prerequisites
- Completed steps 1-4 at minimum:
- Unified Catalog reader permission in one governance domain
Discover data products
- In Unified Catalog, select Discovery, then select Data products.
- On the Data products page, use the search bar to search for vaccination rates by age.
- Here you see the data products you published in section 2. This shows how users will only be exposed to the data intended for them to discover and prevents users from having to navigate a highly technical data estate.
- Select the Covid-19 Vaccination and Case Trending by Age data product
- Here, consumers can see the metadata you provided and any of the other properties that were configured during setup. The data quality score is here as well so consumers know the quality before they even get access to the data.
- Select the asset and the consumer can see all of the columns at are available in the data asset.
- Select the Outbreak glossary term and the consumer can see the description and other information about the term to gain a deeper understanding of the data.
- Once the consumer is confident that they want to use that data, they need to get approved access to the data.
- Select Request access
- Fill in the form detail to submit a request.
- User: leave your name
- Manager Approval: automatically required and directed to the Microsoft Entra ID manager.
- Purpose: select a purpose
- Business justification: OKR monitoring
- Check the box next to the attestation to say you understand the expectations to use this data.
- Select Send.
The access request will now be sent to the listed managed in Microsoft Entra ID. From here, the manager can access the requests by opening the email and selecting a link or coming into Microsoft Purview. Approving and manage access can be done directly in Microsoft Purview.
- In Unified Catalog, select Catalog management, then select Requests.
- Select the Personal Health domain.
- Select the request you submitted.
- Now the approvers are able to approve or decline by selecting Respond on the request.