Sparse Vector

A sparse vector is one in which the overwhelming majority of values are zero, with only a few non-zero entries. This is the natural representation for keyword-based text retrieval, where a document is described by a vector over the whole vocabulary and only the dimensions corresponding to words it actually contains carry a value.

Sparse vectors are the foundation of traditional search methods like TF-IDF and BM25. They are highly interpretable — you can read off exactly which terms a document is characterised by — and they excel at exact term matching, handling rare or novel vocabulary that dense embedding models may struggle to represent. Their structure also lends itself to efficient storage and to inverted-index lookups.

In hybrid search, sparse vectors provide the lexical signal that complements the semantic signal of dense vectors. Both are computed for queries and documents, scored separately, and then fused into a single ranking. Some modern techniques, such as learned sparse representations, even bring a degree of semantic generalisation to sparse vectors while keeping their interpretability and exact-match strengths.