Indexer does not read the metadata from the blob

Bryś Andrzej 20 Reputation points
2024-08-23T11:09:33.76+00:00

Hi,

I have a few documents uploaded to the blob storage, they all have a metadata added with name "Project".

User's image

In Azure AI Search i clicked "Import and Vectorize data" - and went through with configuration.

Then I added a new column to the vector named "Project" with the type "Edm.String" that is Retreiveable, Filterable and Searchable.

User's image

I reseted and rerun the indexer - it finished correctly.

When I run a query in the Index's Search Explorer, i can see that this new field is always null.

User's image

I tried to add the field mapping to the indexer, but still the metadata was not copied to the index.

Indexer is set to the "Content and metadata" so it should be read correctly, am I right?User's image

I cannot see what am I doing wrong. Could you help me?

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,338 questions
Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,167 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Bryś Andrzej 20 Reputation points
    2024-12-09T14:22:50.68+00:00

    I have recently managed to get it working, but still no idea what was wrong.

    During importing, on step number 4 click "Preview and edit"

    User's image

    There you can click "Add field" and the metedata field was present on the list. User's image

    After finishing and indexing the field was correctly populated with the data:
    User's image

    Hope that helps!

    2 people found this answer helpful.

  2. Nehruji R 8,166 Reputation points Microsoft Vendor
    2024-08-26T08:43:56.8333333+00:00

    Hello Bryś Andrzej,

    Greetings! Welcome to Microsoft Q&A Platform.

    I understand that you’re having trouble with your Azure Blob Storage indexer not reading metadata correctly and from the issue description, you are getting null values for the custom metadata fields that you wanted to transition to the Search Index as retrievable fields. You also tried to add the field mapping to the indexer, but still the metadata was not copied to the index, but the new field is always null. kindly check the logs to fetch more details about the error: Diagnostic settings.

    Troubleshooting common indexer errors and warnings in Azure AI Search

    In Azure AI Search a vectorizer is software that performs vectorization, such as a deployed embedding model on Azure OpenAI, that converts text (or images) to vectors during query execution.

    It's defined in a search index, it applies to searchable vector fields, and it's used at query time to generate an embedding for a text or image query input. If instead you need to vectorize content as part of the indexing process, refer to Integrated Vectorization (Preview). For built-in vectorization during indexing, you can configure an indexer and skillset that calls an embedding model for your raw text content.

    refer - https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-configure-vectorizer,

    https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-import-vectors?tabs=sample-data-storage%2Cmodel-aoai%2Cconnect-data-storage,

    https://learn.microsoft.com/en-us/azure/architecture/ai-ml/architecture/search-blob-metadata

    Ensure that the indexer has the necessary read permissions for the blob storage. The managed identity of the search service should have Storage Blob Data Reader permissions. Ensure that the data source configuration is correct and that the blobs contain the “Project” metadata. You can manually inspect a few blobs to confirm that the metadata is present and correctly formatted. Check the indexer logs for any errors or warnings that might indicate why the “Project” field is not being populated. The logs can provide detailed information about any issues encountered during the indexing process.

    refer - https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage,https://learn.microsoft.com/en-us/azure/search/search-howto-index-one-to-many-blobs,https://github.com/Azure/azure-search-vector-samples/issues/71.

    Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


  3. Prashanth Raghavan 0 Reputation points
    2025-01-08T06:52:45.3733333+00:00

    Hi,

    I grappled with this issue as well and finally found the solution by trial and error. Here is what worked for me:

    1. First made sure that the blob has the metadata. In my case its 'requestId'screenshot
    2. Navigated to AI Search Service -> Indexes -> Index. Edit JSON and added the field. Save
         {
               "name": "requestId",
               "type": "Edm.String",
               "key": false,
               "retrievable": true,
               "stored": true,
               "searchable": false,
               "filterable": true,
               "sortable": false,
               "facetable": true,
               "synonymMaps": []
         }
      
      1. Navigated to AI Search Service -> Indexers -> Indexer. Edit JSON as below and Save. Make sure to have "contentAndMetadata" for "dataToExtract"
         {
         ...
         ...,
         
         
         "parameters": {
             "batchSize": null,
             "maxFailedItems": null,
             "maxFailedItemsPerBatch": null,
             "base64EncodeKeys": null,
             "configuration": {
               "dataToExtract": "contentAndMetadata",
               "parsingMode": "default"
             }
           },
         
         "fieldMappings": [], 
         ...
         }
         			
         
      
      1. Navigated to Skillsets -> skillset name -> Edit json and save.
         "indexProjections": {
             "selectors": [
               {
                 "targetIndexName": "<targetIndexName>",
                 "parentKeyFieldName": "parent_id",
                 "sourceContext": "/document/pages/*",
                 "mappings": [
                   ...
                   {
                     "name": "requestId",
                     "source": "/document/propertyId",
                     "inputs": []
                   },
         		  ...	
                 ]
               }
      
      1. Ran the indexer and then performed a query on the populated index. I was able to find the requestId correctly populated.
      All the help articles and AI help did not think updating the Skillset mattered, but apparently it does 🤷‍♂️
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.