Re-ranking

Re-ranking is a second stage in retrieval that takes an initial set of candidate results and reorders them using a more accurate, more expensive model than the one used to retrieve them. The first stage retrieves broadly and quickly; the re-ranker then sharpens the ordering so the most relevant items rise to the top.

The most common re-ranker is a cross-encoder, which scores the relevance of a query and a candidate document by processing them together, capturing fine-grained interactions that the independent embeddings of the first stage cannot. Because cross-encoders are too slow to run across an entire database, they are applied only to the top handful of candidates — say the best twenty — where their precision pays off.

Re-ranking is especially valuable in retrieval-augmented generation, where the quality of the few chunks fed to the model matters enormously. The initial vector search ensures good recall by surfacing all plausibly relevant chunks, and the re-ranker improves precision by identifying which of them are truly best, so the model receives the most relevant context within its limited window. Lightweight re-rankers add only milliseconds while meaningfully lifting answer quality.