Product Quantisation (PQ)

Product quantisation, or PQ, is a compression technique that shrinks vectors dramatically by splitting each vector into several smaller sub-vectors and replacing each sub-vector with a short code drawn from a learned codebook. Rather than storing full floating-point values, the database stores compact codes, achieving compression ratios that can reach dozens of times.

The method works in stages. The dimensions of each vector are divided into equal groups; within each group, clustering produces a small set of representative centroids — the codebook — and each sub-vector is replaced by the identifier of its nearest centroid. Distance computations then use precomputed lookup tables over the codebooks, so approximate similarities can be calculated quickly from the compact codes.

The payoff is the ability to hold billions of vectors in a fraction of the memory, making large-scale search affordable. It is frequently combined with IVF clustering as IVF-PQ, the standard approach for disk-based, billion-scale search. The cost is some loss of accuracy, since the codes are approximations, and a training step to build the codebooks — but well-tuned PQ keeps recall acceptable while slashing memory use.