Should I go for Fine-tuning?

Question

I trying to create a RAG chatbot that uses formal documents for database. I created a promptflow where I added some features like query expansion etc. .My problem is that the prompt flow is slow. The average response is around 14 seconds which is just too much for a good user experience. The most time it taken by the look up node which retrieves relevant information from database. Is there a way on how to improve performance of this node? Are there any other methods that can help with the overall response time? Would fine-tuning be an option? ( I already played with prompt engineering and other things )

Answer

You have several areas to consider, including the following:

Improving Lookup Node Performance

Optimize the Embedding Store:
- Index Type: Use an optimized index such as Hierarchical Navigable Small World (HNSW) for vector similarity searches, as it's designed for fast approximate nearest neighbor searches.
- Index Parameters: Fine-tune the parameters (like ef_construction and M for HNSW) to balance precision and query speed.
Use Faster Storage Backend:
- Replace general-purpose databases with specialized vector databases like Pinecone, Weaviate, or Milvus that are optimized for retrieval tasks.
- Ensure that the database is hosted in a low-latency environment, close to where the application runs.
Pre-Filtering:
- Implement lightweight keyword-based or metadata filters before embedding lookups to reduce the search space significantly.
Batch Processing for Query Expansion:
- If query expansion is creating multiple subqueries, batch the expanded queries to the database rather than sending them one by one.

Fine-Tuning vs. Retrieval Optimization

Fine-Tuning Options:
- Fine-tune the base language model on your formal documents to make it more responsive and reduce reliance on retrieval. This can help when the retrieval process is too slow, but it may not fully eliminate the need for a database.
Hybrid Approach:
- Combine a fine-tuned language model with retrieval. Train the model to work better with partial or incomplete context to minimize the need for long retrieval steps.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Share via

Should I go for Fine-tuning?

1 answer

You have several areas to consider, including the following:

Your answer