Some queries are hard to solve with "basic" RAG. When questions require multi-step reasoning, full-document understanding (not just chunks), or aggregating many results that match specific criteria, simple retrieve-and-generate pipelines break down, we need agentic RAG. But this added capability comes at a cost: as agents plan, search, read, and iterate, they quickly use up a lot of context, which both degrades answer quality and increases costs and latency.