Skip to content

GPU Acceleration

Using graphics processing units to speed up vector index building and similarity search, offering large throughput gains for billion-scale workloads.

GPU acceleration uses graphics processing units to speed up the heavy numerical work in vector search — building indexes and computing similarities — by exploiting their ability to perform thousands of arithmetic operations in parallel. The same hardware that renders graphics turns out to be ideal for the massive parallel math that vector operations require.

Vector workloads are a natural fit. Comparing a query against many vectors, or building an index over millions of them, involves enormous numbers of independent multiply-and-add operations that a GPU can run simultaneously, rather than one after another as a CPU largely must. For very large datasets or high query throughput, this can yield order-of-magnitude speedups in both index construction and search.

The trade-off is cost and complexity. GPUs are expensive and have limited memory, so GPU acceleration makes the most sense at billion-scale workloads, demanding throughput requirements, or during the one-time cost of building a large index. Many systems use GPUs for index building and bulk operations while serving live queries from CPU or a mix, balancing speed against expense.