How do we upload vector embeddings to an Azure AI Search index

Aravind Vijayaraghavan 40 Reputation points
2025-01-18T05:21:42.1733333+00:00

So I recently created vector fields and a vector profile for them and tried to mergeorUpload these vector embeddings to the index. I already embedded certain textual columns and stored them in a dataframe along with the rest of the columns in the same df. But when I run the code, I get this error regarding the vector fields:

Error 1)
An error occurred during document upload: () The request is invalid. Details: An unexpected 'PrimitiveValue' node was found when reading from the JSON reader. A 'StartArray' node was expected.

Error 2)
An error occurred during document upload: () The request is invalid. Details: Invalid JSON. A token was not recognized in the JSON content.

This is my code:

def upload_documents_to_search_client(df, chunk_size=32000):
    """Uploads documents to the search client in chunks."""
    data = [
        {
            "@search.action": "mergeOrUpload",
            "id": str(row["id"]),
            "vect_dev_exp_feedback": [] if pd.isna(row["vect_dev_exp_feedback"]) else row["vect_dev_exp_feedback"],
            "vect_neg_feedback": [] if pd.isna(row["vect_neg_feedback"]) else row["vect_neg_feedback"],  
            "vect_tools_feedback": [] if pd.isna(row["vect_tools_feedback"]) else row["vect_tools_feedback"],
            "vect_wlb_feedback": [] if pd.isna(row["vect_wlb_feedback"]) else row["vect_wlb_feedback"],
            "vect_growth_feedback": [] if pd.isna(row["vect_growth_feedback"]) else row["vect_growth_feedback"],
        }
        for _, row in df.iterrows()
    ]

	 for chunk in chunk_data(data, chunk_size):
	        try:
	            result = search_client.upload_documents(documents=chunk)
	            print(f"Uploaded {len(chunk)} documents successfully.")
	        except Exception as e:
	            print(f"An error occurred during document upload: {e}")
	            return None


upload_documents_to_search_client(df)

Some of the vector embedding colums have no lists or no values because there were no texts in the textual feedback columns, so because of that it would always say null values can't be uploaded to these vector fields. Can someone please help with this?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,165 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sina Salam 16,526 Reputation points
    2025-01-19T15:08:56.63+00:00

    Hello Aravind Vijayaraghavan,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are in need to upload vector embeddings to an Azure AI Search index.

    Regarding the two errors, you will need to ensure that all vector fields are lists and validate the JSON structure before uploading. Check the updated version of your code below for the followings:

    1. The validate_json_structure function ensures that the JSON structure is valid before attempting to upload.
    2. The applymap function is updated to ensure all vector fields are lists, even if they contain single values.
    import pandas as pd
    import json
    def validate_json_structure(data):
        """Validate JSON structure to ensure it meets the expected format."""
        try:
            json.dumps(data)
            return True
        except (TypeError, ValueError) as e:
            print(f"Invalid JSON structure: {e}")
            return False
    def upload_documents_to_search_client(df, chunk_size=32000):
        """Uploads documents to the search client in chunks."""
        data = [
            {
                "@search.action": "mergeOrUpload",
                "id": str(row["id"]),
                "vect_dev_exp_feedback": [] if pd.isna(row["vect_dev_exp_feedback"]) else list(row["vect_dev_exp_feedback"]),
                "vect_neg_feedback": [] if pd.isna(row["vect_neg_feedback"]) else list(row["vect_neg_feedback"]),  
                "vect_tools_feedback": [] if pd.isna(row["vect_tools_feedback"]) else list(row["vect_tools_feedback"]),
                "vect_wlb_feedback": [] if pd.isna(row["vect_wlb_feedback"]) else list(row["vect_wlb_feedback"]),
                "vect_growth_feedback": [] if pd.isna(row["vect_growth_feedback"]) else list(row["vect_growth_feedback"]),
            }
            for _, row in df.iterrows()
        ]
        if not validate_json_structure(data):
            print("Aborting upload due to invalid JSON structure.")
            return
        for chunk in chunk_data(data, chunk_size):
            try:
                result = search_client.upload_documents(documents=chunk)
                print(f"Uploaded {len(chunk)} documents successfully.")
            except Exception as e:
                print(f"An error occurred during document upload: {e}")
                return None
    def chunk_data(data, chunk_size):
        """Yield successive chunks from data."""
        for i in range(0, len(data), chunk_size):
            yield data[i:i + chunk_size]
    # Example usage
    df = pd.DataFrame({
        "id": [1, 2, 3],
        "vect_dev_exp_feedback": [[], [0.1, 0.2], None],
        "vect_neg_feedback": [None, [0.3, 0.4], [0.5, 0.6]],
        "vect_tools_feedback": [[], [], []],
        "vect_wlb_feedback": [None, None, None],
        "vect_growth_feedback": [[0.7, 0.8], [], [0.9, 1.0]]
    })
    # Replace NaN values with empty lists and ensure all vector fields are lists
    df = df.applymap(lambda x: [] if pd.isna(x) else list(x) if isinstance(x, (list, tuple)) else [x])
    upload_documents_to_search_client(df)
    

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.