Replication is the practice of keeping multiple copies of a vector index across different nodes, so that the same data is available in more than one place. It serves two main goals: increasing how many queries the system can handle, and protecting against failures.
For throughput, replicas let queries be spread across many copies of the index. Since search is read-heavy, adding replicas multiplies the number of queries the system can serve in parallel, allowing it to scale to high request volumes without each node being overwhelmed. Incoming queries are load-balanced across the available replicas.
For reliability, replication provides fault tolerance: if one node fails, others holding the same data continue serving, so the system stays available and no data is lost. Replication is often combined with sharding, where the dataset is split across nodes for capacity and each shard is then replicated for resilience and read scaling. The trade-offs are the extra storage and compute the copies consume, and the need to keep replicas consistent as data changes.