Get started with custom projects in Document Intelligence Studio
This content applies to: v4.0 (GA) | Previous versions: v3.1 (GA) v3.0 (GA)
Document Intelligence Studio is an online tool for visually exploring, understanding, and integrating features from the Document Intelligence service in your applications. This quickstart aims to give you a guide of setting up a custom project in Document Intelligence Studio.
Prerequisites for new users
Please refer to the following documentation for subscription and resource creation, as well as authentication setup.
Additional prerequisites for custom projects
In addition to the Azure account and a Document Intelligence or Azure AI services resource, you need:
Azure Blob Storage container
A standard performance Azure Blob Storage account. You create containers to store and organize your training documents within your storage account. If you don't know how to create an Azure storage account with a container, following these quickstarts:
- Create a storage account. When creating your storage account, make sure to select Standard performance in the Instance details → Performance field.
- Create a container. When creating your container, set the Public access level field to Container (anonymous read access for containers and blobs) in the New Container window.
Azure role assignments
For custom projects, the following role assignments are required for different scenarios.
Basic
- Cognitive Services User: You need this role for Document Intelligence or Azure AI services resource to train the custom model or do analysis with trained models.
- Storage Blob Data Contributor: You need this role for the Storage Account to create a project and label data.
Advanced
- Storage Account Contributor: You need this role for the Storage Account to set up CORS settings (this action is a one-time effort if the same storage account is reused).
- Contributor: You need this role to create a resource group and resources.
Note
If local (key-based) authentication is disabled for your Document Intelligence service resource and storage account, be sure to obtain Cognitive Services User and Storage Blob Data Contributor roles respectively, so you have enough permissions to use Document Intelligence Studio. The Storage Account Contributor and Contributor roles only allow you to list keys but does not give you permission to use the resources when key-access is disabled.
Configure CORS
CORS (Cross Origin Resource Sharing) needs to be configured on your Azure storage account for it to be accessible from the Document Intelligence Studio. To configure CORS in the Azure portal, you need access to the CORS tab of your storage account.
Select the CORS tab for the storage account.
Start by creating a new CORS entry in the Blob service.
Set the Allowed origins to
https://documentintelligence.ai.azure.com
.Tip
You can use the wildcard character '*' rather than a specified domain to allow all origin domains to make requests via CORS.
Select all the available 8 options for Allowed methods.
Approve all Allowed headers and Exposed headers by entering an * in each field.
Set the Max Age to 120 seconds or any acceptable value.
To save the changes, select the save button at the top of the page.
CORS should now be configured to use the storage account from Document Intelligence Studio.
Sample documents set
Sign in to the Azure portal and navigate to Your storage account > Data storage > Containers.
Select a container from the list.
Select Upload from the menu at the top of the page.
The Upload blob window appears.
Select your files to upload.
Note
By default, the Studio will use documents that are located at the root of your container. However, you can use data organized in folders by specifying the folder path in the Custom form project creation steps. See Organize your data in subfolders
Use Document Intelligence Studio features
Auto label documents with prebuilt models or one of your own models
In custom extraction model labeling page, you can now auto label your documents using one of Document Intelligent Service prebuilt models or your trained models.
For some documents, duplicate labels after running autolabel are possible. Make sure to modify the labels so that there are no duplicate labels in the labeling page afterwards.
Auto label tables
In custom extraction model labeling page, you can now auto label the tables in the document without having to label the tables manually.
Add test files directly to your training dataset
Once you train a custom extraction model, make use of the test page to improve your model quality by uploading test documents to training dataset if needed.
If a low confidence score is returned for some labels, make sure to correctly label your content. If not, add them to the training dataset and relabel to improve the model quality.
Make use of the document list options and filters in custom projects
Use the custom extraction model labeling page to navigate through your training documents with ease by making use of the search, filter, and sort by feature.
Utilize the grid view to preview documents or use the list view to scroll through the documents more easily.
Project sharing
Share custom extraction projects with ease. For more information, see Project sharing with custom models.
Next steps
- Follow our Document Intelligence v3.1 migration guide to learn the differences from the previous version of the REST API.
- Explore our v4.0 SDK quickstarts to try the v3.0 features in your applications using the new client libraries.
- Refer to our v4.0 REST API quickstarts to try the v3.0 features using the new REST API.