Skip to content

Centroid

The geometric centre of a cluster of vectors, used in IVF indexing as the representative point around which similar vectors are grouped.

A centroid is the geometric centre of a group of vectors — the average position of all the points in a cluster. If you have a hundred vectors representing related documents, their centroid is the single vector you get by averaging all hundred coordinate by coordinate. It acts as a compact summary of where that cluster sits in vector space.

Centroids are fundamental to cluster-based indexing. In an IVF (Inverted File) index, the vector space is partitioned by running k-means clustering, which produces a set of centroids. Every stored vector is then assigned to its nearest centroid, grouping the data into cells. At query time, the database compares the query only to the nearest centroids and searches just those cells, dramatically shrinking the work required.

The quality of the centroids directly affects search quality. Well-placed centroids group genuinely similar vectors together, so searching a few cells finds the true nearest neighbours. Poorly trained centroids scatter similar vectors across many cells, hurting recall unless many cells are searched.