Azure AISearch Indexer: "'JSON arrays with element type 'Float' map to Collection(Edm.Double)"

Ilg Alexander 0 Reputation points
2024-12-17T10:01:28.1466667+00:00

I have the following problem. I am trying to build an indexer in Azure AI Search. I have a skillset with a “Custom.WebApiSkill” skill. This provides me with the following response body:

{
  "values": [
    {
      "recordId": "1",
      "data": {
        "embedding": [
          -0.013657977,
          0.004854262,
          -0.015335504,
          -0.010732211,
          ...
        ]
      }
    }
  ]
}  

As part of the indexer, I am now trying to map the “embedding” value of the response body to a field in my index:

    "outputFieldMappings": [
    {
      "sourceFieldName": "/document/pages/*/embedding",
      "targetFieldName": "content_vector",
      "mappingFunction": null
    }
  ]

My index field "content_vector" looks like that:

   
    {
      "name": "content_vector",
      "type": "Collection(Edm.Single)",
      "key": false,
      "retrievable": true,
      "stored": true,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "synonymMaps": [],
      "dimensions": 1536,
      "vectorSearchProfile": "myHnswProfile"
    }

However, I receive the following error when executing:


The data field 'content_vector/0' in the document with key 'aHR0cHM6Ly9zdHJhZ3Byb3RvdHlwZGV2My5ibG9iLmNvcmUud2luZG93cy5uZXQvdGVzdGRhdGEvS29tbXVuaWthdGlvbnN0ZWNobmlrLUZpYmVsLnBkZg2' has an invalid value of type 'Collection(Edm.Double)' ('JSON arrays with element type 'Float' map to Collection(Edm.Double)'). The expected type was 'Collection(Edm.Single)'.

How can I make sure that my custom WebApi returns the embedding array with float32 values, or how can I make sure that my indexer interprets the values as float32 (Edm.Single) and not as float64 (Edm-Double)?

Thank you very much!

I tried to use numpy in my Custon WebAPI (python) to convert the values of "embedding" to float32, but that didn't worked.

Something like that:

embedding_float32 = np.array(embedding, dtype=np.float32).tolist()

UPDATE:

I tried using “numpy” to convert the array to “float32”, just like you showed in your first code snippet. Nevertheless, the indexer interprets it as float64 (Edm.Double):

The data field 'content_vector/0' in the document with key 'xyz' has an invalid value of type 'Collection(Edm.Double)' ('JSON arrays with element type 'Float' map to Collection(Edm.Double)'). The expected type was 'Collection(Edm.Single)

Is there a possibility that the indexer interprets the values as float32 (Edm.Single) or that I force the data type in my CustomWebAPI? The problem is that Python does not natively differentiate between float32 and float64 and therefore treats and returns the value as float64 by default.

Here is the link to my WebAPI in GitHub: https://github.com/Alexkanns/CustomWebAPI/blob/main/init.py

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,303 questions
Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,141 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. brtrach-MSFT 17,166 Reputation points Microsoft Employee
    2025-01-04T08:00:32.39+00:00

    @Ilg Alexander Here are a few steps you can take to ensure your embeddings are interpreted as float32 (Edm.Single) instead of float64 (Edm.Double):

    1. Ensure Proper Conversion in Python: Even though you've tried using numpy to convert the array to float32, it's possible that the conversion isn't being applied correctly. Make sure you're converting the array and then serializing it properly. Here's an example:
       import numpy as np
       import json
    
       def convert_to_float32(embedding):
           embedding_float32 = np.array(embedding, dtype=np.float32).tolist()
           return embedding_float32
    
       # Example usage
       embedding = [-0.013657977, 0.004854262, -0.015335504, -0.010732211]
       embedding_float32 = convert_to_float32(embedding)
       response_body = {
           "values": [
               {
                   "recordId": "1",
                   "data": {
                       "embedding": embedding_float32
                   }
               }
           ]
       }
    
       # Convert to JSON
       response_json = json.dumps(response_body)
       print(response_json)
    
    1. Check Your API Response: Ensure that your API response is correctly formatted and that the embedding values are indeed in float32 format. You can print the type of the elements in the embedding array to verify:
       print(type(embedding_float32[0]))  # Should print <class 'float'>
    
    1. Update Your Indexer Configuration: If the above steps don't resolve the issue, you might need to explicitly specify the data type in your indexer configuration. Unfortunately, Azure AI Search might still interpret the values as float64 due to the way JSON serialization works in Python.
    2. Custom Serialization: You can create a custom JSON encoder to ensure the values are serialized as float32. Here's an example:
       import json
       import numpy as np
    
       class Float32Encoder(json.JSONEncoder):
           def default(self, obj):
               if isinstance(obj, np.float32):
                   return float(obj)
               return json.JSONEncoder.default(self, obj)
    
       def convert_to_float32(embedding):
           embedding_float32 = np.array(embedding, dtype=np.float32).tolist()
           return embedding_float32
    
       embedding = [-0.013657977, 0.004854262, -0.015335504, -0.010732211]
       embedding_float32 = convert_to_float32(embedding)
       response_body = {
           "values": [
               {
                   "recordId": "1",
                   "data": {
                       "embedding": embedding_float32
                   }
               }
           ]
       }
    
       response_json = json.dumps(response_body, cls=Float32Encoder)
       print(response_json)
    
    1. Azure Function Configuration: Ensure that your Azure Function is correctly configured to handle the data types. Sometimes, the issue might be with how the function processes the data.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.