Cross-encoder

A cross-encoder is a model that judges the relevance of a query and a document by processing them together as a single joint input, rather than encoding each separately. This lets it model fine-grained interactions between specific words in the query and the document, producing a far more accurate relevance score than a bi-encoder can.

That accuracy comes at a cost: because the query and document must be fed through the model together, you cannot pre-compute document representations. Every query-document pair requires a fresh model inference, which is too slow to run across an entire database for each query.

The standard solution is a two-stage pipeline. A fast bi-encoder retrieves a broad set of candidates using pre-computed embeddings and approximate search, then the cross-encoder re-ranks just the top candidates — say the best 20 — to produce a precise final ordering. This combines the speed of vector search with the precision of joint scoring, and is the most common way to boost retrieval quality in RAG systems.