Why llm token count increased if I directly attached the data source to the llm model

Question

The question presents two scenarios involving AI search and an LLM:

Scenario 1:

AI search is done separately.

The search results are passed as context to the LLM.

The LLM processes this context, consuming X tokens in total.

Scenario 2:

The LLM itself performs the AI search.

It generates search queries, processes results, and formulates a response.

This approach consumes 2X tokens in total.

Key Question:

Why does Scenario 2 consume twice as many tokens as Scenario 1?

Answer

When the LLM itself handles search (Scenario 2), it must iteratively “talk to itself” to craft queries, read results, refine queries, and ultimately produce a final answer. Each of those interactions is part of a conversation with the LLM—every query and every chunk of retrieved text gets fed back into the model, consuming tokens in both directions (input + output) for each step.

Share via

Why llm token count increased if I directly attached the data source to the llm model

1 answer

Your answer