Exact nearest neighbour search finds the true closest vectors by comparing a query vector with every vector in the database. This is simple and reliable, but its cost grows linearly with the number of stored vectors, which makes it difficult to use at large scale or under tight latency requirements. Approximate nearest neighbour search uses an index to avoid checking everything, trading a small amount of possible recall loss for much faster queries, lower compute cost, and better scalability.
This guide explains how exact, or flat, vector search works, why its cost is usually described as O(n), what approximation changes, how recall is measured against exact ground truth, and when exactness is still the right choice. By the end, you should be able to reason clearly about the tradeoff between perfect nearest-neighbour accuracy and practical retrieval performance in AI database systems.
What Nearest Neighbour Search Does In An AI Database
Nearest neighbour search is the retrieval step that finds stored vectors closest to a query vector. In an AI database, those vectors usually represent text chunks, images, products, users, tickets, documents, or other data that has been converted into embeddings. A query is embedded into the same vector space, and the database searches for stored vectors that are near it according to a distance or similarity measure.
The word “nearest” depends on the metric being used. Some systems use cosine similarity to compare direction, some use dot product to compare alignment and magnitude, and some use Euclidean distance to compare geometric distance. The database returns the top k closest records, where k is the number of neighbours requested by the application.
This search pattern matters because many AI applications do not retrieve by exact keyword match alone. A retrieval-augmented generation system may need passages that mean something similar to a user question. A recommendation system may need items similar to a user’s recent activity. A deduplication workflow may need records that are nearly the same but not textually identical.
Once the task is framed this way, the central question becomes practical: should the system compare the query against every vector, or should it use an index that narrows the search? That question is the difference between exact and approximate nearest neighbour search.
How Exact Flat Search Works
Exact nearest neighbour search is often called flat search, brute-force search, or exhaustive search. The idea is direct: for each query, the system scans the full vector collection, computes the distance or similarity between the query vector and every stored vector, and then returns the closest k results. There is no shortcut in the search path, and no candidate is skipped before being compared.
A simplified exact search process looks like this:
- The application sends a query vector to the database.
- The database compares that query vector with every stored vector in the target collection or filtered subset.
- Each comparison produces a similarity or distance score.
- The database keeps the best k scoring vectors.
- The matching records are returned, often with IDs, metadata, and scores.
This approach is exact because it has complete information. If every candidate has been scored with the same metric, the database can identify the true top k nearest neighbours. There is no risk that a closer vector was missed by an index traversal, partitioning decision, or compressed representation.
The simplicity of flat search is also why it is useful as a baseline. When teams evaluate an approximate index, they often compare its results with flat search results on the same dataset. Flat search provides the ground truth: the set of neighbours the system would return if it made no approximation at all.
Exactness sounds ideal, but it has a cost that grows with the collection. To understand why approximate search is so common in vector databases, it helps to look closely at what O(n) means in this setting.

Why Exact Search Has O(n) Cost
The O(n) cost of flat nearest neighbour search means the amount of work grows in direct proportion to the number of vectors being searched. If a collection has one million vectors, the database may need to score one million vectors for one query. If the collection grows to ten million vectors, the same query pattern may require ten times as many vector comparisons before the top k results can be known.
In practice, the arithmetic cost is often better described as O(n times d), where n is the number of vectors and d is the number of dimensions in each vector. A 768-dimensional embedding requires hundreds of component-level operations per comparison. A 1,536-dimensional embedding requires even more. The high-level scaling problem is still linear in n, but vector dimensionality affects the constant cost of each comparison.
Flat search can be made faster with optimized hardware, parallelism, memory layout improvements, quantization, batching, and efficient top-k selection. These optimizations matter. A well-engineered exact scan can be surprisingly fast for modest datasets or for filtered subsets that reduce the number of candidates. But optimization does not change the core growth pattern: if the system must compare against every candidate, larger candidate sets require more work.
The cost shows up in several ways. Query latency can rise as the collection grows. CPU or GPU usage can increase under query load. Memory bandwidth can become a bottleneck because the system must read many vectors. Multi-tenant workloads can become harder to isolate because one expensive query may consume shared resources.
This is where approximation becomes attractive. If the application can tolerate results that are almost always the same as exact results, but not mathematically guaranteed to be identical, an approximate index can avoid most of the full scan.
What Approximate Nearest Neighbour Search Buys
Approximate nearest neighbour search uses an index or search strategy that reduces the number of vectors compared in detail. Instead of scoring every vector, the database tries to identify a promising subset of candidates and search within that subset. The result is usually much lower latency and better scalability, with a controlled possibility that some true nearest neighbours may be missed.
Different approximate indexes use different shortcuts. Graph-based methods connect nearby vectors and navigate through the graph toward a good neighbourhood. Partition-based methods group vectors into clusters or lists and search only some of those groups. Quantization-based methods compress vectors so the system can compare smaller representations first, sometimes followed by reranking on full vectors. Many production systems combine these ideas.
The main thing approximation buys is the ability to search large vector collections without scanning the entire collection for every query. That can make the difference between a search that takes too long for an interactive application and one that responds quickly enough for chat, search, recommendations, or retrieval pipelines.
Approximation can also reduce infrastructure cost. If a system performs fewer comparisons per query, it may need less compute per request. If vectors or index representations are compressed, it may use less memory. If search latency is lower, the same infrastructure may support more concurrent users.
The tradeoff is that approximation introduces tuning decisions. Most ANN systems expose parameters that influence how much of the index is explored. Searching more widely tends to improve recall but increases latency. Searching more narrowly tends to improve speed but can miss relevant neighbours. This is not a flaw in ANN search; it is the central control knob that makes the method useful.
Because approximation is a tradeoff rather than a yes-or-no property, teams need a way to measure how much quality they are losing. That is where recall becomes the most important evaluation concept.
How Recall Is Measured
Recall measures how many of the true nearest neighbours were returned by the approximate search. The usual method is to run exact search first on a representative evaluation set, treat those exact results as ground truth, then compare the approximate results against that ground truth at the same value of k.
The most common metric is recall@k. If the exact top 10 neighbours for a query contain 10 specific records, and the approximate search returns 8 of those records in its top 10, then recall@10 for that query is 0.8, or 80 percent. The same calculation is averaged across many queries to estimate retrieval quality for the workload.
Recall@k answers a specific question: how much of the exact top-k set did the approximate index recover? It does not automatically answer every relevance question. A result can have high recall against vector ground truth while still being unhelpful to a user if the embeddings are poor, the chunks are badly modeled, or the application needs a different notion of relevance. But recall is still essential because it isolates the quality of the search index from the quality of the embedding model and data design.
Recall should usually be measured alongside latency, throughput, memory usage, index build time, update behavior, and application-level quality. A configuration with 99 percent recall may be too slow for a live chat experience. A configuration with 85 percent recall may be fine for candidate generation if a reranker or downstream model later evaluates more context. The right target depends on the product requirement, not on a universal recall number.
Good recall testing also requires representative queries. If the evaluation set contains only easy queries, the index may look better than it really is. If it ignores metadata filters, long-tail content, fresh data, or domain-specific queries, the measured recall may not match production behavior. The best tests use real or realistic queries, the same distance metric used in production, and the same filtering patterns the application will actually run.
Once recall is measured clearly, the decision becomes less abstract. You can compare exact and approximate search in terms of speed, cost, and quality, then choose the configuration that fits the application rather than assuming one approach is always superior.
Exact Search Versus ANN In Practice
The practical difference between exact and approximate search is not that one is correct and the other is careless. Exact search optimizes for certainty. Approximate search optimizes for useful results under real system constraints. Both can be the right choice depending on the size of the candidate set, the latency budget, the cost budget, and the consequences of missing a true nearest neighbour.
Exact search is easiest to reason about. There are fewer index parameters, fewer build-time tradeoffs, and fewer surprises caused by index structure. If the dataset is small enough, flat search can be the cleanest option. It is also valuable when building evaluation sets because it gives the reference results used to calculate recall.
ANN search is usually preferred when the vector collection is large, the query rate is high, or the application requires interactive latency. In these cases, scanning every vector for every query can become wasteful even if it is technically possible. ANN search turns vector retrieval into a tunable system: teams can decide how much index exploration is worth the latency cost.
Some systems combine both approaches. An ANN index may generate a candidate set quickly, and then exact scoring may rerank those candidates using full vectors. Metadata filtering may reduce the candidate set so much that exact search becomes efficient inside the filtered subset. Evaluation workflows may use exact search offline while production uses ANN online.
This mixed pattern is common because nearest neighbour search is rarely a standalone goal. It is part of a larger retrieval pipeline. The right design is the one that returns sufficiently relevant candidates at the speed and cost the application needs.
When Exactness Is Still Worth It
Exact nearest neighbour search is still worth using when the benefit of guaranteed top-k results outweighs the cost of scanning the candidate set. Approximation is not automatically better just because it is faster. If the dataset is small, the filtered subset is narrow, or the task requires an exact reference result, flat search can be simpler and more trustworthy.
Exactness is especially useful in evaluation. To know whether an ANN index is working well, you need exact ground truth for comparison. Teams often run flat search offline on a sample of queries, then use those results to measure recall, tune index parameters, and detect regressions when embeddings, data, or index settings change.
Exact search is also useful when missing the nearest neighbour has a high cost. For example, a duplicate detection system may need to find the closest known record before deciding whether two items are the same. A legal, scientific, or compliance-oriented retrieval workflow may prefer a slower exhaustive search if the collection is small enough and the cost of missing a close match is unacceptable.
Another case is heavily filtered search. Suppose a database contains millions of vectors, but a query filter narrows the candidate set to a few thousand records. A flat scan over that filtered subset may be efficient and exact. In this situation, an ANN index over the whole collection may not be necessary for that particular query path.
Exact search can also be a good early-stage choice. During prototyping, teams may not yet know their data distribution, query patterns, recall requirements, or growth curve. Starting with exact search can provide a dependable baseline. As the dataset and traffic grow, the team can introduce ANN indexing with a clearer understanding of what quality must be preserved.
These cases show that exactness is not obsolete. It is a tool with a predictable cost profile. The important decision is whether that cost profile fits the workload.

How To Choose Between Exact And Approximate Search
The best way to choose between exact and approximate search is to start with the application’s retrieval requirement rather than the index type. Ask how many vectors must be searched, how quickly results must return, how many queries will run concurrently, and what happens if the system misses one of the exact nearest neighbours.
Exact search is often a strong fit when:
- The collection or filtered candidate set is small enough to scan within the latency budget.
- The application needs guaranteed nearest-neighbour results.
- The team is creating ground truth for recall evaluation.
- The system is in an early prototype stage and simplicity matters more than scale.
- The workload runs offline, where latency is less important than completeness.
Approximate search is often a strong fit when:
- The collection is large enough that full scans are too slow or expensive.
- The application needs low-latency responses for users or downstream AI systems.
- The workload can tolerate a measured recall tradeoff.
- The system needs to support high query volume or many tenants.
- The retrieval pipeline includes reranking, filtering, or other downstream steps that can recover quality from a larger candidate set.
The most reliable decision process is empirical. Build an exact baseline, choose representative queries, measure recall@k for ANN configurations, and compare those results with latency and cost. Then tune the index until the system reaches the smallest amount of work that still satisfies the quality target.
This choice is also not permanent. As data grows, exact search may become too expensive. As filters become more selective, exact scans may become practical again for some query paths. As embedding models or chunking strategies change, recall should be remeasured. Nearest neighbour search is a living part of the AI data system, not a one-time setting.
Common Misunderstandings About Exact And Approximate Search
One common misunderstanding is that approximate search means low-quality search. In reality, ANN indexes can often return results that are very close to exact results while using much less compute. The quality depends on the data, metric, index type, tuning parameters, and how the retrieval pipeline uses the returned candidates.
Another misunderstanding is that exact search always gives the best user experience. Exact search gives the true nearest vectors according to the selected metric, but that metric may not perfectly match user relevance. If the embedding model does not represent the task well, exact vector search can faithfully return neighbours that are mathematically close but not useful. Retrieval quality depends on the whole system.
A third misunderstanding is that recall is the same as relevance. Recall@k compares ANN results with exact vector results. It does not measure whether users liked the answers, whether a generated response was grounded, or whether the retrieved documents were sufficient. Those outcomes require additional evaluation, such as human judgment, answer-quality checks, or task-specific success metrics.
Finally, some teams assume they must choose one approach everywhere. In practice, many AI database designs use exact and approximate search together. Exact search can support evaluation, reranking, filtered subsets, and small collections, while ANN search supports large-scale interactive retrieval.
Understanding these distinctions makes index selection less mysterious. The goal is not to defend one method, but to match the search method to the retrieval job.
FAQs
1. What is exact nearest neighbour search?
Exact nearest neighbour search compares a query vector with every candidate vector and returns the true closest results according to the chosen distance or similarity metric. It is often called flat, brute-force, or exhaustive search because it does not skip candidates through an approximate index.
2. Why is exact search described as O(n)?
Exact search is described as O(n) because the number of vector comparisons grows linearly with the number of vectors being searched. If the candidate set doubles, the number of comparisons roughly doubles. In real systems, vector dimensionality also matters, so the arithmetic work is often closer to O(n times d), where d is the number of dimensions.
3. What does approximate nearest neighbour search do differently?
Approximate nearest neighbour search uses an index to reduce how many vectors must be compared in detail. It may navigate a graph, search selected partitions, compare compressed representations, or combine several techniques. The goal is to return very similar results to exact search while using much less time and compute.
4. How is recall measured for ANN search?
Recall is measured by comparing approximate results with exact ground-truth results. For recall@k, the system checks how many of the exact top-k neighbours also appeared in the approximate top-k results. If exact search finds 10 true neighbours and ANN returns 8 of them in its top 10, recall@10 is 80 percent for that query.
5. Does higher recall always mean better search quality?
Higher recall usually means the approximate index is closer to exact vector search, but it does not guarantee better user-facing relevance. If the embeddings, chunking, metadata, or ranking logic are weak, exact vector neighbours may still be poor results. Recall should be evaluated alongside latency, cost, and application-level relevance.
6. When should an AI database use exact search instead of ANN?
An AI database should use exact search when the candidate set is small enough to scan, when guaranteed nearest-neighbour results are required, when building ground truth for evaluation, or when filters reduce the search space to a manageable size. Exact search is also useful during early development because it provides a simple and dependable baseline.
Takeaway
Exact nearest neighbour search is the simplest and most reliable way to find the true closest vectors, but its O(n) scan cost makes it harder to use as collections and query volumes grow. Approximate nearest neighbour search buys speed, scale, and lower infrastructure cost by searching a carefully chosen subset of the vector space, with recall@k used to measure how closely it matches exact ground truth. This guidance is most useful for teams designing AI database retrieval for search, RAG, recommendations, deduplication, or other embedding-based systems where the right answer depends on balancing retrieval quality against latency and cost.