Microsoft GraphRAG – Index Error – Azure Blob Storage

Octavian Mocanu 45 Reputation points
2025-02-10T11:25:57.9533333+00:00

I want to use Microsoft GraphRAG to identify some chat topics/scope based on some data source.

I followed the guidance from here.

I’ve installed the following graphrag Python package:

Name: graphrag
Version: 0.9.0
Summary: GraphRAG: A graph-based retrieval-augmented generation (RAG) system.
Home-page: 
Author: Alonso Guevara Fernández
Author-email: alonsog@microsoft.com
License: MIT

The data source is hosted on Azure Blob Container from an Azure Storage Account (3 txt files).

The settings file is like this:

encoding_model: cl100k_base # this needs to be matched to your model!

llm:
  api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
  type: azure_openai_chat # or azure_openai_chat
  model: gpt-4o
  model_supports_json: false # recommended if this is available for your model.
  api_base: https://[openai-service].openai.azure.com
  api_version: "2024-08-01-preview"
  deployment_name: gpt-4o

parallelization:
  stagger: 0.3

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  vector_store: # configuration for AI Search
    type: azure_ai_search
    url: https://[search-service].search.windows.net
    api_key: ${AZURE_SEARCH_SERVICE_API_KEY}

  llm:
    api_key: ${GRAPHRAG_API_KEY_TEXT_EMBEDDING}
    type: azure_openai_embedding # or azure_openai_embedding
    model: text-embedding-ada-002
    api_base:  https://[openai-service].openai.azure.com
    api_version: 2024-02-15-preview
    deployment_name: text-embedding-ada-002

### Input settings ###

input:
  type: blob # file or blob
  connection_string: "${AZURE_STORAGE_ACCOUNT_CONNECTION_STRING}"
  container_name: "graphrag-input-001"
  file_type: text # or csv
  file_encoding: utf-8
  file_pattern: ".*\\.(txt|md)$"

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: blob
  container_name: "graphrag-workspace-001"
  connection_string: "${AZURE_STORAGE_ACCOUNT_CONNECTION_STRING}"

reporting:
  type: file #file or blob
  base_dir: "logs"

storage:
  type: blob # file or blob
  container_name: "graphrag-workspace-001"
  connection_string: "${AZURE_STORAGE_ACCOUNT_CONNECTION_STRING}"

### Workflow settings ###

skip_workflows: []

entity_extraction:
  prompt: "prompts/entity_extraction.txt"
  max_gleanings: 1

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  enabled: false
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: true # if true, will generate node2vec embeddings for nodes

umap:
  enabled: true # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: true
  embeddings: true
  transient: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  prompt: "prompts/local_search_001_sys_prompt_sk_plugin_ftos_context.txt"
  conversation_history_max_turns: 5

global_search:
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  prompt: "prompts/drift_search_system_prompt.txt"

Running index command:

graphrag index --root ./ragtest

I’ve got this error:

AttributeError: 'list' object has no attribute 'on_error'         

The logs.json content:

{
    "type": "error",
    "data": "Error executing verb \"create_base_entity_graph\" in create_base_entity_graph: 'name'",
    "stack": "Traceback (most recent call last):\n  File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\datashaper\\workflow\\workflow.py\", line 415, in _execute_verb\n    result = await result\n             ^^^^^^^^^^^^\n  File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\graphrag\\index\\workflows\\v1\\subflows\\create_base_entity_graph.py\", line 47, in create_base_entity_graph\n    await create_base_entity_graph_flow(\n  File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\graphrag\\index\\flows\\create_base_entity_graph.py\", line 58, in create_base_entity_graph\n    merged_entities = _merge_entities(entity_dfs)\n                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\graphrag\\index\\flows\\create_base_entity_graph.py\", line 119, in _merge_entities\n    all_entities.groupby([\"name\", \"type\"], sort=False)\n  File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\frame.py\", line 9183, in groupby\n    return DataFrameGroupBy(\n           ^^^^^^^^^^^^^^^^^\n  File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\groupby\\groupby.py\", line 1329, in __init__\n    grouper, exclusions, obj = get_grouper(\n                               ^^^^^^^^^^^^\n  File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\groupby\\grouper.py\", line 1043, in get_grouper\n    raise KeyError(gpr)\nKeyError: 'name'\n",
    "source": "'name'",
    "details": null
}

Could you please guide me to solve this error?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,821 questions
0 comments No comments
{count} votes

Accepted answer
  1. Adrian Calinescu 80 Reputation points Microsoft Employee
    2025-02-25T09:50:23.54+00:00

    You have to specify base_dir, otherwise it will default to base_dir: "input", which does not exist in your blob storage container and will trigger all kinds of parsing errors, which unfortunately aren't handled very gracefully.

    See

    https://github.com/microsoft/graphrag/blob/main/graphrag/config/defaults.py#L255

    and

    https://github.com/microsoft/graphrag/blob/0144b3fd88940218375bca9bb251b81eec192624/graphrag/config/models/input_config.py#L26

    ### Input settings ###
    input:
      type: blob
      connection_string: "${AZURE_STORAGE_ACCOUNT_CONNECTION_STRING}"
      container_name: "graphrag-input-001"
      base_dir: "."
      file_type: text # or csv
      file_encoding: utf-8
      file_pattern: ".*\\.txt$"	
    
    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. SKale 1,441 Reputation points
    2025-02-10T19:38:15.2166667+00:00

    Hello Octavian Mocanu,

    Thank you for posting your question in the Microsoft Q&A forum.

    The error you're encountering, KeyError: 'name', indicate that the code is trying to group entities by the column's "name"and "type", but the "name" column is missing from your data. I have listed few reasons for you to validate info:

    1. The input data (text files) might not be structured in a way that the GraphRAG system expects. Specifically, it might be missing the "name" field that is required for entity extraction.
    2. The entity extraction process might not be correctly configured to extract the "name" field from your data.

    Possible solutions/checklist to validate info on your end may help:

    • Ensure that your text files contain structured data that includes a "name" field or something equivalent. If your data is unstructured, you might need to preprocess it to extract entities and assign them a "name" field.
    • The entity_extraction section in your YAML configuration references a prompt file (prompts/entity_extraction.txt). Open this file and ensure that the prompt is designed to extract entities with a "name" field. If the prompt is not correctly extracting the "name" field, you may need to modify it.
    • If your data does not naturally contain a "name" field, you might need to modify the entity extraction process to generate or infer this field. For example, you could modify the entity_extraction prompt to extract a different field and map it to "name".
    • Add logging or print statements in the create_base_entity_graph.py file to inspect the entity_dfs variable. This will help you understand what data is being passed to the _merge_entities function. Ensure that the data frames in entity_dfs contain the expected columns ("name" and "type").
    • If your data does not have a "name" field, you might need to update the configuration to use a different field for grouping. For example, if your data has a "title" field, you could modify the _merge_entities function to group by "title" instead of "name".

    After making the necessary changes above, re-run the graphrag index --root ./ragtest command to see if the issue is resolved.

    If above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing similar issue.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.