Cosine Similarity

Cosine similarity measures how alike two vectors are by looking at the angle between them rather than their length. Two vectors pointing in the same direction score 1, perpendicular vectors score 0, and opposite vectors score -1. Because it ignores magnitude entirely, it captures pure directional similarity.

This makes cosine similarity the default metric for text embeddings. Embedding models are typically trained to encode meaning in the direction a vector points, not its length. As a result, a short note and a long article on the same topic should be judged similar even though their vector magnitudes differ — and cosine similarity does exactly that, ignoring the length difference and comparing only orientation.

Mathematically it is the dot product of the two vectors divided by the product of their lengths. A useful shortcut: when vectors are normalised to unit length, which many embedding models do automatically, cosine similarity and the plain dot product become identical, so the database can use the cheaper dot-product computation.