Skip to content

DiskANN

A disk-based ANN algorithm by Microsoft that builds a graph index across data stored on SSD, enabling billion-scale search with low memory requirements.

DiskANN is an approximate-nearest-neighbour algorithm developed at Microsoft that is designed to run a graph index largely from SSD storage rather than keeping everything in RAM. This lets a single machine search billions of vectors using far less memory than an in-memory index like HNSW would require.

The challenge it solves is cost. Graph-based indexes are fast but memory-hungry, and holding billions of high-dimensional vectors entirely in RAM is extremely expensive. DiskANN keeps the bulk of the graph and full-precision vectors on disk, holding only a compressed representation in memory to guide the search, then reading the precise data it needs from SSD in a small number of carefully minimised disk accesses.

The result is a practical path to billion-scale vector search on commodity hardware, with recall competitive with in-memory methods and latency that, while higher than pure RAM, remains acceptable for most applications. DiskANN and its variants underpin several large-scale and cost-efficient vector database offerings.