Skip to content

Inverted Index

A data structure mapping each term to the list of documents containing it, the foundation of keyword search and the sparse half of hybrid retrieval.

An inverted index is a data structure that maps each term to the list of documents that contain it. Instead of going from a document to its words, it inverts the relationship — going from a word to all the documents where it appears — which is exactly what you need to answer a keyword query quickly.

This structure is the foundation of keyword search and has powered search engines for decades. To find documents containing a word, you simply look up that word in the inverted index and retrieve its list, rather than scanning every document. Combined with scoring algorithms like BM25, the inverted index makes lexical search fast and scalable.

In vector databases, the inverted index provides the sparse, keyword-matching half of hybrid search. While the vector index handles semantic similarity, the inverted index handles exact term matches, and the two are fused to produce results that capture both meaning and precise terminology. The name also appears in IVF — Inverted File index — where a related idea organises vectors by cluster membership.