Quickstart: Azure Health De-identification client library for .NET

Get started with the Azure Health De-identification client library for .NET to de-identify your health data. Follow these steps to install the package and try out example code for basic tasks.

API reference documentation | Library source code | Package (NuGet) | More Samples on GitHub

Prerequisites

Setting up

Create a de-identification service

A de-identification service provides you with an endpoint URL. This endpoint url can be utilized as a Rest API or with an SDK.

  1. Install Azure CLI

  2. Create a de-identification service resource

    REGION="<Region>"
    RESOURCE_GROUP_NAME="<ResourceGroupName>"
    DEID_SERVICE_NAME="<NewDeidServiceName>"
    az resource create -g $RESOURCE_GROUP_NAME -n $DEID_SERVICE_NAME --resource-type microsoft.healthdataaiservices/deidservices --is-full-object -p "{\"identity\":{\"type\":\"SystemAssigned\"},\"properties\":{},\"location\":\"$REGION\"}"
    

Assign RBAC Roles to the de-identification service

We need to assign a role to our de-identification service so we have permissions to perform the actions in this quickstart.

Since we're using real-time and job endpoints, we assign the DeID Data Owner roles.

To learn how to assign this role to your de-identification service, refer to: Manage access to the de-identification service with Azure role-based access control (RBAC) in Azure Health Data Services

Create an Azure Storage account

  1. Install Azure CLI

  2. Create an Azure Storage Account

    STORAGE_ACCOUNT_NAME="<NewStorageAccountName>"
    az storage account create --name $STORAGE_ACCOUNT_NAME --resource-group $RESOURCE_GROUP_NAME --location $REGION
    

Authorize de-identification service on the Azure Storage account

  • Give the de-identification service access to your storage account

     STORAGE_ACCOUNT_ID=$(az storage account show --name $STORAGE_ACCOUNT_NAME --resource-group $RESOURCE_GROUP_NAME --query id --output tsv)
     DEID_SERVICE_PRINCIPAL_ID=$(az resource show -n $DEID_SERVICE_NAME -g $RESOURCE_GROUP_NAME  --resource-type microsoft.healthdataaiservices/deidservices --query identity.principalId --output tsv)
     az role assignment create --assignee $DEID_SERVICE_PRINCIPAL_ID --role "Storage Blob Data Contributor" --scope $STORAGE_ACCOUNT_ID
    

Install the package

The client library is available through NuGet, as the Azure.Health.Deidentification package.

  1. Install package

    dotnet add package Azure.Health.Deidentification
    
  2. Also, install the Azure Identity package if not already installed.

    dotnet add package Azure.Identity
    

Object model

  • DeidentificationClient is responsible for the communication between the SDK and our De-identification Service Endpoint.
  • DeidentificationContent is used for string de-identification.
  • DeidentificationJob is used to create jobs to de-identify documents in an Azure Storage Account.
  • PhiEntity is the span and category of a single PHI entity detected via a Tag OperationType.

Code examples

Create a de-identification client

Before you can create the client, you need to find your de-identification service endpoint URL.

You can find the endpoint URL with the Azure CLI:

az resource show -n $DEID_SERVICE_NAME -g $RESOURCE_GROUP_NAME  --resource-type microsoft.healthdataaiservices/deidservices --query properties.serviceUrl --output tsv

Then you can create the client using that value.

using Azure.Identity;
using Azure.Health.Deidentification;

string serviceEndpoint = "https://example123.api.deid.azure.com";

DeidentificationClient client = new(
    new Uri(serviceEndpoint),
    new DefaultAzureCredential()
);

De-identify a string

This function allows you to de-identify any string you have in memory.

DeidentificationContent content = new("SSN: 123-04-5678");
DeidentificationResult result = await client.DeidentifyAsync(content);

Tag a string

Tagging can be done the same way and de-identifying by changing the OperationType.

DeidentificationContent content = new("SSN: 123-04-5678");
content.Operation = OperationType.Tag;

DeidentificationResult result = await client.DeidentifyAsync(content);

Create a de-identification job

This function allows you to de-identify all files, filtered via prefix, within an Azure Blob Storage Account.

To create the job, we need the URL to the blob endpoint of the Azure Storage Account.

az resource show -n $STORAGE_ACCOUNT_NAME -g $RESOURCE_GROUP_NAME  --resource-type Microsoft.Storage/storageAccounts --query properties.primaryEndpoints.blob --output tsv

Now we can create the job. This example uses folder1/ as the prefix. The job de-identifies any document that matches this prefix and write the de-identified version with the output_files/ prefix.

using Azure;

Uri storageAccountContainerUri = new("https://exampleStorageAccount.blob.core.windows.net/containerName");

DeidentificationJob job = new(
    new SourceStorageLocation(storageAccountContainerUri, "folder1/"),
    new TargetStorageLocation(storageAccountContainerUri, "output_files/")
);

job = client.CreateJob(WaitUntil.Started, "my-job-1", job).Value;

Get the status of a de-identification job

Once a job is created, you can view the status and other details of the job.

DeidentificationJob job = client.GetJob("my-job-1").Value;

Run the code

Once your code is updated in your project, you can run it using:

dotnet run

Clean up resources

Delete de-identification service

az resource delete -n $DEID_SERVICE_NAME -g $RESOURCE_GROUP_NAME  --resource-type microsoft.healthdataaiservices/deidservices

Delete Azure Storage account

az resource show -n $STORAGE_ACCOUNT_NAME -g $RESOURCE_GROUP_NAME  --resource-type Microsoft.Storage/storageAccounts

Delete role assignment

az role assignment delete --assignee $DEID_SERVICE_PRINCIPAL_ID --role "Storage Blob Data Contributor" --scope $STORAGE_ACCOUNT_ID

Troubleshooting

Unable to access source or target storage

Ensure the permissions are given, and the Managed Identity for the de-identification service is set up properly.

See Authorize de-identification service on the Azure Storage account

Job failed with status PartialFailed

You can utilize the GetJobDocuments function on the DeidentificationClient to view per file error messages.

See Sample

Next steps

In this quickstart, you learned:

  • How to create a de-identification service and assign a role on a storage account.
  • How to create a de-identification client
  • How to de-identify strings and create jobs on documents within a storage account.