Get started with chat document security for Python

When you build a chat application using the RAG pattern with your own data, make sure that each user receives an answer based on their permissions. Follow the process in this article to add document access control to your chat app.

An authorized user should have access to answers contained within the documents of the chat app.

Screenshot of chat app with answer with required authentication access.

An unauthorized user shouldn't have access to answers from secured documents they don't have authorization to see.

Screenshot of chat app with answer indicating user doesn't have access to data.

Note

This article uses one or more AI app templates as the basis for the examples and guidance in the article. AI app templates provide you with well-maintained, easy to deploy reference implementations that help to ensure a high-quality starting point for your AI apps.

Architectural overview

Without document security feature, the enterprise chat app has a simple architecture using Azure AI Search and Azure OpenAI. An answer is determined from queries to Azure AI Search where the documents are stored, in combination with a response from an Azure OpenAI GPT model. No user authentication is used in this simple flow.

Architectural diagram showing an answer determined from queries to Azure AI Search where the documents are stored, in combination with a prompt response from Azure OpenAI.

To add security for the documents, you need to update the enterprise chat app:

  • Add client authentication to the chat app with Microsoft Entra.
  • Add server-side logic to populate a search index with user and group access.

Architectural diagram showing a use authenticating with Microsoft Entra ID, then passing that authentication to Azure AI Search.

Azure AI Search doesn't provide native document-level permissions and can't vary search results from within an index by user permissions. Instead, your application can use search filters to ensure a document is accessible to a specific user or by a specific group. Within your search index, each document should have a filterable field that stores user or group identity information.

Architectural diagram showing that to secure the documents in Azure AI Search, each document includes user authentication, which is returned in the result set.

Because the authorization isn't natively contained in Azure AI Search, you need to add a field to hold user or group information, then filter any documents that don't match. To implement this technique, you need to:

  • Create a document access control field in your index dedicated to storing the details of users or groups with document access.
  • Populate the document's access control field with the relevant user or group details.
  • Update this access control field whenever there are changes in user or group access permissions.
  • If your index updates are scheduled with an indexer, changes are picked up on the next indexer run. If you don't use an indexer, you need to manually reindex.

In this article, the process of securing documents in Azure AI Search is made possible with example scripts, which you as the search administrator would run. The scripts associate a single document with a single user identity. You can take these scripts and apply your own security and productionizing requirements to scale to your needs.

Determine security configuration

The solution provides boolean environment variables to turn on features necessary for document security in this sample.

Parameter Purpose
AZURE_USE_AUTHENTICATION When set to true, enables user sign-in to the chat app and App Service authentication. Enables Use oid security filter in the chat app Developer settings.
AZURE_ENFORCE_ACCESS_CONTROL When set to true, requires authentication for any document access. The Developer settings for oid and group security will be turned on and disabled so they can't be disabled from the UI.
AZURE_ENABLE_GLOBAL_DOCUMENTS_ACCESS When set to true, this setting allows authenticated users to search on documents that have no access controls assigned, even when access control is required. This parameter should only be used when AZURE_ENFORCE_ACCESS_CONTROL is enabled.
AZURE_ENABLE_UNAUTHENTICATED_ACCESS When set to true, this setting allows unauthenticated users to use the app, even when access control is enforced. This parameter should only be used when AZURE_ENFORCE_ACCESS_CONTROL is enabled.

Use the following sections to understand the security profiles supported in this sample. This article configures the Enterprise profile.

Enterprise: Required account + document filter

Each user of the site must sign in, and the site does contain content which is public to all users. The document level security filter is applied to all requests.

Environment variables:

  • AZURE_USE_AUTHENTICATION=true
  • AZURE_ENABLE_GLOBAL_DOCUMENTS_ACCESS=true
  • AZURE_ENFORCE_ACCESS_CONTROL=true

Mixed use: Optional account + document filter

Each user of the site may sign in, and the site does contain content which is public to all users. The document level security filter is applied to all requests.

Environment variables:

  • AZURE_USE_AUTHENTICATION=true
  • AZURE_ENABLE_GLOBAL_DOCUMENTS_ACCESS=true
  • AZURE_ENFORCE_ACCESS_CONTROL=true
  • AZURE_ENABLE_UNAUTHENTICATED_ACCESS=true

Prerequisites

A development container environment is available with all dependencies required to complete this article. You can run the development container in GitHub Codespaces (in a browser) or locally using Visual Studio Code.

To use this article, you need the following prerequisites:

You need more prerequisites depending on your preferred development environment.

Open development environment

Begin now with a development environment that has all the dependencies installed to complete this article.

GitHub Codespaces runs a development container managed by GitHub with Visual Studio Code for the Web as the user interface. For the most straightforward development environment, use GitHub Codespaces so that you have the correct developer tools and dependencies preinstalled to complete this article.

Important

All GitHub accounts can use Codespaces for up to 60 hours free each month with 2 core instances. For more information, see GitHub Codespaces monthly included storage and core hours.

  1. Start the process to create a new GitHub Codespace on the main branch of the Azure-Samples/azure-search-openai-demo GitHub repository.

  2. Right-click on the following button, and select Open link in new windows in order to have both the development environment and the documentation available at the same time.

    Open in GitHub Codespaces

  3. On the Create codespace page, review the codespace configuration settings and then select Create new codespace

    Screenshot of the confirmation screen before creating a new codespace.

  4. Wait for the codespace to start. This startup process can take a few minutes.

  5. In the terminal at the bottom of the screen, sign in to Azure with the Azure Developer CLI.

    azd auth login
    
  6. Complete the authentication process.

  7. The remaining tasks in this article take place in the context of this development container.

Get required information with Azure CLI

Get your subscription ID and tenant ID with the following Azure CLI command. Copy the value to use as your AZURE_TENANT_ID.

az account list --query "[].{subscription_id:id, name:name, tenantId:tenantId}" -o table

If you get an error about your tenant's conditional access policy, you need a second tenant without a conditional access policy.

  • Your first tenant, associated with your user account, is used for the AZURE_TENANT_ID environment variable.
  • Your second tenant, without conditional access, is used for the AZURE_AUTH_TENANT_ID environment variable to access Microsoft Graph. For tenants with a conditional access policy, find the ID of a second tenant without a conditional access policy or create a new tenant.

Set environment variables

  1. Run the following commands to configure the application for the Enterprise profile.

    azd env set AZURE_USE_AUTHENTICATION true
    azd env set AZURE_ENABLE_GLOBAL_DOCUMENTS_ACCESS true
    azd env set AZURE_ENFORCE_ACCESS_CONTROL true
    
  2. Run the following command to set the tenant, which authorizes the user sign in to the hosted application environment. Replace <YOUR_TENANT_ID> with the tenant ID.

    azd env set AZURE_TENANT_ID <YOUR_TENANT_ID>
    

Note

If you have a conditional access policy on your user tenant, you need to specify an authentication tenant.

Deploy chat app to Azure

Deployment includes creating the Azure resources, uploading the documents, creating the Microsoft Entra identity apps (client & server), and turning on identity for the hosting resource.

  1. Run the following Azure Developer CLI command to provision the Azure resources and deploy the source code:

    azd up
    
  2. Use the following table to answer the AZD deployment prompts:

    Prompt Answer
    Environment name Use a short name with identifying information such as your alias and app: tjones-secure-chat.
    Subscription Select a subscription to create the resources in.
    Location for Azure resources Select a location near you.
    Location for documentIntelligentResourceGroupLocation Select a location near you.
    Location for openAIResourceGroupLocation Select a location near you.

    Wait 5 or 10 minutes after the app is deployed to allow the app to start up.

  3. After the application has been successfully deployed, you see a URL displayed in the terminal.

  4. Select that URL labeled (✓) Done: Deploying service webapp to open the chat application in a browser.

    Screenshot of chat app in browser showing several suggestions for chat input and the chat text box to enter a question.

  5. Agree to the app authentication pop-up.

  6. When the chat app is displayed, notice in the top right corner that your user is signed in.

  7. Open Developer settings and notice both these options are selected and greyed out (disabled for change).

    • Use oid security filter
    • Use groups security filter
  8. Select the card with What does a product manager do?.

  9. You get an answer like: The provided sources do not contain specific information about the role of a Product Manager at Contoso Electronics.

    Screenshot of chat app in browser showing the answer can't be returned

Open access to a document for a user

Turn on your permissions for the exact document so you can get the answer. These require several pieces of information:

  • Azure Storage
    • Account name
    • Container name
    • Blob/document URL for role_library.pdf
  • User's ID in Microsoft Entra ID

Once this information is known, update the Azure AI Search index oids field for the role_library.pdf document.

Get the URL for a document in storage

  1. In the .azure folder at the root of the project, find the environment directory, and open the .env file with that directory.

  2. Search for the AZURE_STORAGE_ACCOUNT entry and copy its value.

  3. Use the following Azure CLI commands to get the URL of the role_library.pdf blob in the content container.

    az storage blob url \
        --account-name <REPLACE_WITH_AZURE_STORAGE_ACCOUNT \
        --container-name 'content' \
        --name 'role_library.pdf' 
    
    Parameter Purpose
    --account-name Azure Storage account name
    --container-name The container name in this sample is content
    --name The blob name in this step is role_library.pdf
  4. Copy the blob URL to use later.

Get your user ID

  1. In the chap app, select Developer settings.
  2. In the ID Token claims section, copy your objectidentifier. This is known in the next section as the USER_OBJECT_ID.
  1. Use the following script to change the oids field in Azure AI Search for role_library.pdf so you have access to it.

    ./scripts/manageacl.sh \
        -v \
        --acl-type oids \
        --acl-action add \
        --acl <REPLACE_WITH_YOUR_USER_OBJECT_ID> \
        --url <REPLACE_WITH_YOUR_DOCUMENT_URL>
    
    Parameter Purpose
    -v Verbose output.
    --acl-type Group or user object IDs (OIDs): oids
    --acl-action Add to a Search index field. Other options include remove, remove_all, list.
    --acl Group or user's USER_OBJECT_ID
    --url The file's location in Azure storage, such as https://MYSTORAGENAME.blob.core.windows.net/content/role_library.pdf. Don't surround URL with quotes in the CLI command.
  2. The console output for this command looks like:

    Loading azd .env file from current environment...
    Creating Python virtual environment "app/backend/.venv"...
    Installing dependencies from "requirements.txt" into virtual environment (in quiet mode)...
    Running manageacl.py. Arguments to script: -v --acl-type oids --acl-action add --acl 00000000-0000-0000-0000-000000000000 --url https://mystorage.blob.core.windows.net/content/role_library.pdf
    Found 58 search documents with storageUrl https://mystorage.blob.core.windows.net/content/role_library.pdf
    Adding acl 00000000-0000-0000-0000-000000000000 to 58 search documents
    
  3. Optionally, use the following command to verify your permission is listed for the file in Azure AI Search.

    ./scripts/manageacl.sh \
        -v \
        --acl-type oids \
        --acl-action list \
        --acl <REPLACE_WITH_YOUR_USER_OBJECT_ID> \
        --url <REPLACE_WITH_YOUR_DOCUMENT_URL>
    
    Parameter Purpose
    -v Verbose output.
    --acl-type Group or user (oids): oids
    --acl-action List a Search index field oids. Other options include remove, remove_all, list.
    --acl Group or user's USER_OBJECT_ID
    --url The file's location in Azure storage, such as https://MYSTORAGENAME.blob.core.windows.net/content/role_library.pdf. Don't surround URL with quotes in the CLI command.
  4. The console output for this command looks like:

    Loading azd .env file from current environment...
    Creating Python virtual environment "app/backend/.venv"...
    Installing dependencies from "requirements.txt" into virtual environment (in quiet mode)...
    Running manageacl.py. Arguments to script: -v --acl-type oids --acl-action view --acl 00000000-0000-0000-0000-000000000000 --url https://mystorage.blob.core.windows.net/content/role_library.pdf
    Found 58 search documents with storageUrl https://mystorage.blob.core.windows.net/content/role_library.pdf
    [00000000-0000-0000-0000-000000000000]
    

    The array at the end of the output includes your USER_OBJECT_ID and is used to determine if the document is used in the answer with Azure OpenAI.

Verify Azure AI Search contains your USER_OBJECT_ID

  1. Open the Azure portal and search for your AI Search.

  2. Select your search resource from the list.

  3. Select Search management -> Indexes.

  4. Select the gptkbindex.

  5. Select View -> JSON view.

  6. Replace the JSON with the following JSON.

    {
      "search": "*",
      "select": "sourcefile, oids",
      "filter": "oids/any()"
    }
    

    This searches all documents where the oids field has any value and returns the sourcefile, and oids fields.

  7. If the role_library.pdf doesn't have your oid, return to the Provide user access to a document in Azure Search section and complete the steps.

Verify user access to the document

If you completed the steps but didn't see the correct answer, verify your USER_OBJECT_ID is set correctly in Azure AI Search for that role_library.pdf.

  1. Return to the chat app. You may need to sign in again.

  2. Enter the same query so that the role_library content is used in the Azure OpenAI answer: What does a product manager do?.

  3. View the result, which now includes the appropriate answer from the role library document.

    Screenshot of chat app in browser showing the answer is returned.

Clean up resources

Clean up Azure resources

The Azure resources created in this article are billed to your Azure subscription. If you don't expect to need these resources in the future, delete them to avoid incurring more charges.

Run the following Azure Developer CLI command to delete the Azure resources and remove the source code:

azd down --purge

Clean up GitHub Codespaces

Deleting the GitHub Codespaces environment ensures that you can maximize the amount of free per-core hours entitlement you get for your account.

Important

For more information about your GitHub account's entitlements, see GitHub Codespaces monthly included storage and core hours.

  1. Sign into the GitHub Codespaces dashboard (https://github.com/codespaces).

  2. Locate your currently running Codespaces sourced from the Azure-Samples/azure-search-openai-demo GitHub repository.

    Screenshot of all the running Codespaces including their status and templates.

  3. Open the context menu for the codespace and then select Delete.

    Screenshot of the context menu for a single codespace with the delete option highlighted.

Get help

This sample repository offers troubleshooting information.

Troubleshooting

This section offers troubleshooting for issues specific to this article.

Provide authentication tenant

When your authentication is in a separate tenant from your hosting application, you need to set that authentication tenant with the following process.

  1. Run the following command to configure the sample to use a second tenant for the authentication tenant.

    azd env set AZURE_AUTH_TENANT_ID <REPLACE-WITH-YOUR-TENANT-ID>
    
    Parameter Purpose
    AZURE_AUTH_TENANT_ID If AZURE_AUTH_TENANT_ID is set, it's the tenant that hosts the app.
  2. Redeploy the solution with the following command.

    azd up
    

Next steps