Load data into a search index in Azure AI Search
This article explains how to import documents into a predefined search index. In Azure AI Search, a search index is created first with data import following as a second step. The exception is Import wizards in the Azure portal and indexer pipelines, which create and load an index in one workflow.
How data import works
A search service accepts JSON documents that conform to the index schema. A search service can import and index plain text content and vector content in JSON documents.
Plain text content is retrieved from fields in the external data source, from metadata properties, or from enriched content that's generated by a skillset (skills can extract or infer textual descriptions from images and unstructured content).
Vector content is retrieved from a data source that provides it, or it's created by a skillset that implements integrated vectorization in an Azure AI Search indexer workload.
You can prepare these documents yourself, but if content resides in a supported data source, running an indexer or using an Import wizard can automate document retrieval, JSON serialization, and indexing.
Once data is indexed, the physical data structures of the index are locked in. For guidance on what can and can't be changed, see Update and rebuild an index.
Indexing isn't a background process. A search service will balance indexing and query workloads, but if query latency is too high, you can either add capacity or identify periods of low query activity for loading an index.
For more information, see Data import strategies.
Use the Azure portal
In the Azure portal, use the import wizards to create and load indexes in a seamless workflow. If you want to load an existing index, choose an alternative approach.
Sign in to the Azure portal with your Azure account and find your search service.
On the Overview page, select Import data or Import and vectorize data on the command bar to create and populate a search index.
You can follow these links to review the workflow: Quickstart: Create an Azure AI Search index and Quickstart: Integrated vectorization.
After the wizard is finished, use Search Explorer to check for results.
Tip
The import wizards create and run indexers. If indexers are already defined, you can reset and run an indexer from the Azure portal, which is useful if you're adding fields incrementally. Reset forces the indexer to start over, picking up all fields from all source documents.
Use the REST APIs
Documents - Index is the REST API for importing data into a search index. REST APIs are useful for initial proof-of-concept testing, where you can test indexing workflows without having to write much code. The @search.action
parameter determines whether documents are added in full, or partially in terms of new or replacement values for specific fields.
Quickstart: Text search using REST explains the steps. The following example is a modified version of the example. It's been trimmed for brevity and the first HotelId value has been altered to avoid overwriting an existing document.
Formulate a POST call specifying the index name, the "docs/index" endpoint, and a request body that includes the
@search.action
parameter.POST https://[service name].search.windows.net/indexes/hotels-sample-index/docs/index?api-version=2024-07-01 Content-Type: application/json api-key: [admin key] { "value": [ { "@search.action": "upload", "HotelId": "1111", "HotelName": "Stay-Kay City Hotel", "Description": "The hotel is ideally located on the main commercial artery of the city in the heart of New York. A few minutes away is Time's Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.", "Category": "Boutique", "Tags": [ "pool", "air conditioning", "concierge" ] }, { "@search.action": "mergeOrUpload", "HotelId": "2", "HotelName": "Old Century Hotel", "Description": "This is description is replacing the original one for this hotel. New and changed values overwrite the previous ones. In a comma-delimited list like Tags, be sure to provide the full list because there is no merging of values within the field itself.", "Category": "Boutique", "Tags": [ "pool", "free wifi", "concierge", "my first new tag", "my second new tag" ] } ] }
Set the
@search.action
parameter toupload
to create or overwrite a document. Set it tomerge
oruploadOrMerge
if you're targeting updates to specific fields within the document. The previous example shows both actions.Action Effect merge Updates a document that already exists, and fails a document that can't be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type Collection(Edm.String)
. For example, if atags
field starts with a value of["budget"]
and you execute a merge with["economy", "pool"]
, the final value of thetags
field is["economy", "pool"]
. It won't be["budget", "economy", "pool"]
.mergeOrUpload Behaves like merge if the document exists, and upload if the document is new. This is the most common action for incremental updates. upload Similar to an "upsert" where the document is inserted if it's new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field's value is set to null. Send the request.
Look up the documents you just added as a validation step:
GET https://[service name].search.windows.net/indexes/hotel-sample-index/docs/1111?api-version=2024-07-01
When the document key or ID is new, null becomes the value for any field that is unspecified in the document. For actions on an existing document, updated values replace the previous values. Any fields that weren't specified in a "merge" or "mergeUpload" are left intact in the search index.
Use the Azure SDKs
Programmability is provided in the following Azure SDKs.
The Azure SDK for .NET provides the following APIs for simple and bulk document uploads into an index:
There are several samples that illustrate indexing in context of simple and large-scale indexing:
"Load an index" explains basic steps.
Azure.Search.Documents Samples - Indexing Documents from the Azure SDK team adds SearchIndexingBufferedSender.
Tutorial: Index any data couples batch indexing with testing strategies for determining an optimum size.
Be sure to check the azure-search-vector-samples repo for code examples showing how to index vector fields.