How can I upload JSON documents to a search index while running an indexer ?

Question

I'm following this demo with my own data.
However it requires input data to be stored in an Azure Data Source.

The problem is, I don't want to create an Azure Data Source as data will already be stored as indexes.

I know that I can upload JSON documents to the search index as shown in this other demo. But it doesn't run an indexer.

Answer

Hello Julien BROCHIER,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to know how you can upload JSON documents to a search index while running an indexer, simplified that you would like to run an indexer without creating a traditional Azure Data Source.

The demo you mentioned focuses on directly uploading JSON documents to the search index using the REST API, which bypasses the need for an indexer. This method is useful for quickly adding documents but doesn't involve the automated data ingestion and transformation processes that an indexer provides.

To address the problem of uploading JSON documents to an Azure Cognitive Search index while using an indexer, it is essential to understand the limitations of indexers. Indexers inherently rely on a data source such as Azure Blob Storage or Azure SQL Database to function. However, if you aim to avoid a data source, the best suitable alternative approaches are the followings:

Option 1: If the goal is to avoid using a traditional data source, a custom solution can simulate the behavior of an indexer. This involves periodically uploading JSON documents directly to the search index. You can automate this process using tools like Azure Functions or Logic Apps, which can trigger data uploads on a schedule or in response to specific events. This is an example Code snippet for Direct Upload:

from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
search_client = SearchClient(
    endpoint="https://.search.windows.net",
    index_name="my-index",
    credential=AzureKeyCredential("")
)
documents = [
    {"id": "1", "content": "First document"},
    {"id": "2", "content": "Second document"}
]
result = search_client.upload_documents(documents)
print(f"Upload result: {result}")

Option 2: If running an indexer is mandatory, a minimalistic approach involves setting up a lightweight Azure Blob Storage container. This serves as the data source from which the indexer reads and populates the index. The documents can be uploaded to the blob container, and the indexer is configured to process them automatically. This is an example Code snippet for Data Source and Indexer:

from azure.search.documents.indexes.models import DataSource, Indexer
# Create a Blob Storage data source
data_source = DataSource(
    name="my-blob-data-source",
    type="azureblob",
    connection_string="",
    container={"name": "my-container"}
)
index_client.create_data_source_connection(data_source)
# Configure and run the indexer
indexer = Indexer(
    name="my-blob-indexer",
    data_source_name="my-blob-data-source",
    target_index_name="my-index"
)
index_client.create_indexer(indexer)
index_client.run_indexer("my-blob-indexer")

For more detailed information about the above, kindly use the following links: https://learn.microsoft.com/en-us/azure/search and https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/search/azure-search-documents/samples

While indexers require a data source to function, the custom automation approach offers flexibility for scenarios where traditional data sources are not desired as I stated above. However, for scenarios that mandate indexers, minimal use of Azure Blob Storage as a data source provides a compliant yet efficient solution.

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

How can I upload JSON documents to a search index while running an indexer ?

1 answer

Your answer