Top-K refers to the K most similar results returned by a vector search query, ranked by their similarity to the query vector. After the index identifies candidate vectors near the query, they are ordered by similarity and the best K are returned — the standard output format of a vector database query.
The application then decides what to do with these results: inject them into a language model’s prompt for retrieval-augmented generation, display them to a user, or pass them to a re-ranking stage for further refinement. The value of K is chosen to fit the use case — a small K for RAG, where only a few chunks should enter the limited context, or a larger K for browsing and recommendation scenarios.
Choosing K involves a trade-off. Too small a K risks missing relevant results; too large a K increases latency and cost and, in RAG, can overflow the model’s context with marginal material. A common pattern is to retrieve a generous top-K and then narrow it with filtering or re-ranking, so the final set fed downstream is both relevant and appropriately sized.