When the LLM itself handles search (Scenario 2), it must iteratively “talk to itself” to craft queries, read results, refine queries, and ultimately produce a final answer. Each of those interactions is part of a conversation with the LLM—every query and every chunk of retrieved text gets fed back into the model, consuming tokens in both directions (input + output) for each step.
Why llm token count increased if I directly attached the data source to the llm model
trinadh maddimsetti
0
Reputation points
The question presents two scenarios involving AI search and an LLM:
- Scenario 1:
AI search is done separately.
The search results are passed as context to the LLM.
The LLM processes this context, consuming X tokens in total.
- Scenario 2:
The LLM itself performs the AI search.
It generates search queries, processes results, and formulates a response.
This approach consumes 2X tokens in total.
Key Question:
Why does Scenario 2 consume twice as many tokens as Scenario 1?