Scalar Quantisation (SQ)

Scalar quantisation, or SQ, is the simplest form of vector compression. It reduces the precision of each dimension independently, most commonly converting 32-bit floating-point values into 8-bit integers, which cuts the memory each vector consumes by a factor of four.

The method maps the range of values a dimension can take onto a small set of discrete levels — for 8-bit quantisation, 256 of them — and stores each value as the nearest level. Because it treats every dimension separately and needs no learned codebook, scalar quantisation is simple to implement, fast to apply, and quick to build, unlike product quantisation which requires training.

Its appeal is an excellent ratio of savings to accuracy loss: 8-bit scalar quantisation typically reduces memory four-fold while costing only a percent or so of recall, making it a sensible first step for most production deployments. Systems often keep the full-precision vectors on disk so the top candidates from a quantised search can be re-scored at full accuracy, recovering nearly all the lost precision while still enjoying the memory savings during the main search.