High-Dimensional Data

High-dimensional data is data represented by vectors with many dimensions — hundreds or thousands of values each. Modern embeddings are inherently high-dimensional, with common sizes ranging from 384 up to several thousand dimensions, and this high dimensionality is both the source of their expressive power and the cause of their computational challenges.

The many dimensions give embeddings room to encode subtle, multifaceted meaning, capturing relationships that low-dimensional representations could not. But operating in such spaces brings difficulties that defy everyday geometric intuition, collectively known as the curse of dimensionality: distances between points tend to flatten out, and structures that are easy to reason about in two or three dimensions behave strangely.

Vector databases exist precisely to make high-dimensional data practical to store and search. Their specialised index structures and approximate algorithms are engineered to find nearest neighbours quickly in spaces where brute-force comparison would be far too slow and naive geometric methods would fail. Understanding that this data lives in high dimensions is key to understanding why vector search needs purpose-built infrastructure.