Tutorial: Configure Azure Storage to de-identify documents

The Azure Health Data Services de-identification service can de-identify documents in Azure Storage via an asynchronous job. If you have many documents that you would like to de-identify, using a job is a good option. Jobs also provide consistent surrogation, meaning that surrogate values in the de-identified output will match across all documents. For more information about de-identification, including consistent surrogation, see What is the de-identification service?

When you choose to store documents in Azure Blob Storage, you're charged based on Azure Storage pricing. This cost isn't included in the de-identification service pricing. Explore Azure Blob Storage pricing.

In this tutorial, you:

  • Create a storage account and container
  • Upload a sample document
  • Grant the de-identification service access
  • Configure network isolation

Prerequisites

Open Azure CLI

Install Azure CLI and open your terminal of choice. In this tutorial, we're using PowerShell.

Create a storage account and container

  1. Set your context, substituting the subscription name containing your de-identification service for the <subscription_name> placeholder:
    az account set --subscription "<subscription_name>"
    
  2. Save a variable for the resource group, substituting the resource group containing your de-identification service for the <resource_group> placeholder:
    $ResourceGroup = "<resource_group>"
    
  3. Create a storage account, providing a value for the <storage_account_name> placeholder:
    $StorageAccountName = "<storage_account_name>"
    $StorageAccountId = $(az storage account create --name $StorageAccountName --resource-group $ResourceGroup --sku Standard_LRS --kind StorageV2 --min-tls-version TLS1_2 --allow-blob-public-access false --query id --output tsv)
    
  4. Assign yourself a role to perform data operations on the storage account:
    $UserId = $(az ad signed-in-user show --query id -o tsv)
    az role assignment create --role "Storage Blob Data Contributor" --assignee $UserId --scope $StorageAccountId
    
  5. Create a container to hold your sample document:
    az storage container create --account-name $StorageAccountName --name deidtest --auth-mode login
    

Upload a sample document

Next, you upload a document that contains synthetic PHI:

$DocumentContent = "The patient came in for a visit on 10/12/2023 and was seen again November 4th at Contoso Hospital."
az storage blob upload --data $DocumentContent --account-name $StorageAccountName --container-name deidtest --name deidsample.txt --auth-mode login

Grant the de-identification service access to the storage account

In this step, you grant the de-identification service's system-assigned managed identity role-based access to the container. You grant the Storage Blob Data Contributor role because the de-identification service will both read the original document and write de-identified output documents. Substitute the name of your de-identification service for the <deid_service_name> placeholder:

$DeidServicePrincipalId=$(az resource show -n <deid_service_name> -g $ResourceGroup --resource-type microsoft.healthdataaiservices/deidservices --query identity.principalId --output tsv)
az role assignment create --assignee $DeidServicePrincipalId --role "Storage Blob Data Contributor" --scope $StorageAccountId

Configure network isolation on the storage account

Next, you update the storage account to disable public network access and only allow access from trusted Azure services such as the de-identification service. After running this command, you won't be able to view the storage container contents without setting a network exception. Learn more at Configure Azure Storage firewalls and virtual networks.

az storage account update --name $StorageAccountName --public-network-access Disabled --bypass AzureServices

Clean up resources

Once you're done with the storage account, you can delete the storage account and role assignments:

az role assignment delete --assignee $DeidServicePrincipalId --role "Storage Blob Data Contributor" --scope $StorageAccountId
az role assignment delete --assignee $UserId --role "Storage Blob Data Contributor" --scope $StorageAccountId
az storage account delete --ids $StorageAccountId --yes

Next step