Dense Retrieval

Dense retrieval is the approach of finding relevant content by representing both queries and documents as dense embedding vectors and measuring their similarity in vector space. It is the retrieval method that vector databases are built to serve, and the foundation of modern semantic search.

The word dense distinguishes it from sparse, keyword-based retrieval. In dense retrieval, an embedding model compresses meaning into a few hundred or thousand numbers, all carrying signal, so that conceptually related text lands close together even when the wording differs entirely. A query about cutting energy bills can retrieve a document about reducing electricity costs with no shared keywords.

This generalisation is dense retrieval’s great strength, letting it handle synonyms, paraphrases, and even cross-lingual matches. Its weakness is the mirror image: it can miss exact matches for specific codes or rare terms that an embedding does not represent well. For that reason dense retrieval is frequently combined with sparse retrieval in a hybrid system that captures both meaning and exact terminology.