I want to use Microsoft GraphRAG to identify some chat topics/scope based on some data source.
I followed the guidance from here.
I’ve installed the following graphrag Python package:
Name: graphrag
Version: 0.9.0
Summary: GraphRAG: A graph-based retrieval-augmented generation (RAG) system.
Home-page:
Author: Alonso Guevara Fernández
Author-email: alonsog@microsoft.com
License: MIT
The data source is hosted on Azure Blob Container from an Azure Storage Account (3 txt files).
The settings
file is like this:
encoding_model: cl100k_base # this needs to be matched to your model!
llm:
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
type: azure_openai_chat # or azure_openai_chat
model: gpt-4o
model_supports_json: false # recommended if this is available for your model.
api_base: https://[openai-service].openai.azure.com
api_version: "2024-08-01-preview"
deployment_name: gpt-4o
parallelization:
stagger: 0.3
async_mode: threaded # or asyncio
embeddings:
async_mode: threaded # or asyncio
vector_store: # configuration for AI Search
type: azure_ai_search
url: https://[search-service].search.windows.net
api_key: ${AZURE_SEARCH_SERVICE_API_KEY}
llm:
api_key: ${GRAPHRAG_API_KEY_TEXT_EMBEDDING}
type: azure_openai_embedding # or azure_openai_embedding
model: text-embedding-ada-002
api_base: https://[openai-service].openai.azure.com
api_version: 2024-02-15-preview
deployment_name: text-embedding-ada-002
### Input settings ###
input:
type: blob # file or blob
connection_string: "${AZURE_STORAGE_ACCOUNT_CONNECTION_STRING}"
container_name: "graphrag-input-001"
file_type: text # or csv
file_encoding: utf-8
file_pattern: ".*\\.(txt|md)$"
chunks:
size: 1200
overlap: 100
group_by_columns: [id]
### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided
cache:
type: blob
container_name: "graphrag-workspace-001"
connection_string: "${AZURE_STORAGE_ACCOUNT_CONNECTION_STRING}"
reporting:
type: file #file or blob
base_dir: "logs"
storage:
type: blob # file or blob
container_name: "graphrag-workspace-001"
connection_string: "${AZURE_STORAGE_ACCOUNT_CONNECTION_STRING}"
### Workflow settings ###
skip_workflows: []
entity_extraction:
prompt: "prompts/entity_extraction.txt"
max_gleanings: 1
summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
enabled: false
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1
community_reports:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: true # if true, will generate node2vec embeddings for nodes
umap:
enabled: true # if true, will generate UMAP embeddings for nodes
snapshots:
graphml: true
embeddings: true
transient: false
### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
local_search:
prompt: "prompts/local_search_001_sys_prompt_sk_plugin_ftos_context.txt"
conversation_history_max_turns: 5
global_search:
map_prompt: "prompts/global_search_map_system_prompt.txt"
reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
drift_search:
prompt: "prompts/drift_search_system_prompt.txt"
Running index command:
graphrag index --root ./ragtest
I’ve got this error:
AttributeError: 'list' object has no attribute 'on_error'
The logs.json content:
{
"type": "error",
"data": "Error executing verb \"create_base_entity_graph\" in create_base_entity_graph: 'name'",
"stack": "Traceback (most recent call last):\n File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\datashaper\\workflow\\workflow.py\", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\graphrag\\index\\workflows\\v1\\subflows\\create_base_entity_graph.py\", line 47, in create_base_entity_graph\n await create_base_entity_graph_flow(\n File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\graphrag\\index\\flows\\create_base_entity_graph.py\", line 58, in create_base_entity_graph\n merged_entities = _merge_entities(entity_dfs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\graphrag\\index\\flows\\create_base_entity_graph.py\", line 119, in _merge_entities\n all_entities.groupby([\"name\", \"type\"], sort=False)\n File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\frame.py\", line 9183, in groupby\n return DataFrameGroupBy(\n ^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\groupby\\groupby.py\", line 1329, in __init__\n grouper, exclusions, obj = get_grouper(\n ^^^^^^^^^^^^\n File \"C:\\Users\\om\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\groupby\\grouper.py\", line 1043, in get_grouper\n raise KeyError(gpr)\nKeyError: 'name'\n",
"source": "'name'",
"details": null
}
Could you please guide me to solve this error?