Index warm-up is the process of loading a vector index from disk into memory and preparing it before it can serve queries at full speed. After a restart, deployment, or scaling event, the index may not yet be resident in RAM, and the first queries against a cold index are slow until the relevant data has been read in.
This matters operationally because cold-start latency can be severe. A graph index that normally answers in milliseconds may take far longer on its first queries while pages are fetched from disk, caches fill, and internal structures are initialised. For latency-sensitive applications, serving traffic against a cold index produces a noticeable spike in response times.
To avoid this, systems warm up the index ahead of taking live traffic — preloading it into memory and sometimes running representative queries to populate caches — so that real users never hit the cold path. Warm-up time is an important consideration when planning restarts, rolling updates, and autoscaling, since it determines how quickly a new or restarted node can contribute to serving queries.