Skip to content

Curse of Dimensionality

The phenomenon where distance metrics lose discrimination power as the number of dimensions increases, complicating high-dimensional similarity search.

The curse of dimensionality refers to a set of counter-intuitive phenomena that arise when working in spaces with hundreds or thousands of dimensions — spaces that behave very differently from the two- and three-dimensional world our intuition is built on.

The most important effect for vector search is that, in very high dimensions, distances between points tend to converge. Sample random vectors in a thousand-dimensional space and the nearest and farthest points become almost equally distant, so the very notion of nearest loses discriminating power. This threatens the core premise that similar items are meaningfully closer than dissimilar ones.

This is precisely why embedding models must be carefully trained rather than using raw feature vectors. A good model learns to concentrate meaningful structure into a space where distances stay informative. It also explains why bigger is not always better: a compact 384-dimension embedding can outperform a 3,000-dimension one if the extra dimensions add noise rather than signal.