Agentic RAG

Agentic RAG extends standard retrieval-augmented generation by giving an AI agent control over the retrieval process itself. In classic RAG, a single query is embedded, the top matches are fetched, and the results are handed to the language model in one pass. Agentic RAG instead lets the model reason about whether to retrieve, what to retrieve, and how many times, treating retrieval as a tool it can call repeatedly.

This unlocks more complex behaviour. The agent can decompose a hard question into sub-questions, retrieve evidence for each, evaluate whether the results are sufficient, and issue follow-up queries to fill gaps. It can rewrite an unclear query before searching, or decide that a question needs no retrieval at all.

The trade-off is cost and latency: each retrieval step and reasoning loop adds time and token usage. Agentic RAG shines on multi-hop questions and research-style tasks where a single retrieval pass would miss critical context, but simpler lookups are often better served by plain RAG.