Indexer does not read the metadata from the blob

Question

Hi,

I have a few documents uploaded to the blob storage, they all have a metadata added with name "Project".

User's image

In Azure AI Search i clicked "Import and Vectorize data" - and went through with configuration.

Then I added a new column to the vector named "Project" with the type "Edm.String" that is Retreiveable, Filterable and Searchable.

User's image

I reseted and rerun the indexer - it finished correctly.

When I run a query in the Index's Search Explorer, i can see that this new field is always null.

User's image

I tried to add the field mapping to the indexer, but still the metadata was not copied to the index.

Indexer is set to the "Content and metadata" so it should be read correctly, am I right? User's image

I cannot see what am I doing wrong. Could you help me?

Answer

I have recently managed to get it working, but still no idea what was wrong.

During importing, on step number 4 click "Preview and edit"

User's image

There you can click "Add field" and the metedata field was present on the list. User's image

After finishing and indexing the field was correctly populated with the data:
User's image

Hope that helps!

Answer

Hello Bryś Andrzej,

Greetings! Welcome to Microsoft Q&A Platform.

I understand that you’re having trouble with your Azure Blob Storage indexer not reading metadata correctly and from the issue description, you are getting null values for the custom metadata fields that you wanted to transition to the Search Index as retrievable fields. You also tried to add the field mapping to the indexer, but still the metadata was not copied to the index, but the new field is always null. kindly check the logs to fetch more details about the error: Diagnostic settings.

Troubleshooting common indexer errors and warnings in Azure AI Search

In Azure AI Search a vectorizer is software that performs vectorization, such as a deployed embedding model on Azure OpenAI, that converts text (or images) to vectors during query execution.

It's defined in a search index, it applies to searchable vector fields, and it's used at query time to generate an embedding for a text or image query input. If instead you need to vectorize content as part of the indexing process, refer to Integrated Vectorization (Preview). For built-in vectorization during indexing, you can configure an indexer and skillset that calls an embedding model for your raw text content.

refer - https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-configure-vectorizer,

https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-import-vectors?tabs=sample-data-storage%2Cmodel-aoai%2Cconnect-data-storage,

https://learn.microsoft.com/en-us/azure/architecture/ai-ml/architecture/search-blob-metadata

Ensure that the indexer has the necessary read permissions for the blob storage. The managed identity of the search service should have Storage Blob Data Reader permissions. Ensure that the data source configuration is correct and that the blobs contain the “Project” metadata. You can manually inspect a few blobs to confirm that the metadata is present and correctly formatted. Check the indexer logs for any errors or warnings that might indicate why the “Project” field is not being populated. The logs can provide detailed information about any issues encountered during the indexing process.

refer - https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage,https://learn.microsoft.com/en-us/azure/search/search-howto-index-one-to-many-blobs,https://github.com/Azure/azure-search-vector-samples/issues/71.

Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Answer

Hi,

I grappled with this issue as well and finally found the solution by trial and error. Here is what worked for me:

First made sure that the blob has the metadata. In my case its 'requestId'

Navigated to AI Search Service -> Indexes -> Index. Edit JSON and added the field. Save

   {
         "name": "requestId",
         "type": "Edm.String",
         "key": false,
         "retrievable": true,
         "stored": true,
         "searchable": false,
         "filterable": true,
         "sortable": false,
         "facetable": true,
         "synonymMaps": []
   }

Navigated to AI Search Service -> Indexers -> Indexer. Edit JSON as below and Save. Make sure to have "contentAndMetadata" for "dataToExtract"

   {
   ...
   ...,
   
   
   "parameters": {
       "batchSize": null,
       "maxFailedItems": null,
       "maxFailedItemsPerBatch": null,
       "base64EncodeKeys": null,
       "configuration": {
         "dataToExtract": "contentAndMetadata",
         "parsingMode": "default"
       }
     },
   
   "fieldMappings": [], 
   ...
   }

Navigated to Skillsets -> skillset name -> Edit json and save.

   "indexProjections": {
       "selectors": [
         {
           "targetIndexName": "",
           "parentKeyFieldName": "parent_id",
           "sourceContext": "/document/pages/*",
           "mappings": [
             ...
             {
               "name": "requestId",
               "source": "/document/propertyId",
               "inputs": []
             },
   		  ...	
           ]
         }

Ran the indexer and then performed a query on the populated index. I was able to find the requestId correctly populated.

All the help articles and AI help did not think updating the Skillset mattered, but apparently it does 🤷‍♂️

Share via

Indexer does not read the metadata from the blob

3 answers

Your answer