Skip to content

High-Cardinality Filter

A metadata filter on a field with many distinct values such as user ID or timestamp, which is harder to optimise than low-cardinality filters and can significantly impact ANN recall.

A high-cardinality filter is a metadata filter on a field that has a very large number of distinct possible values — such as user ID, timestamp, email address, or session ID. Cardinality refers to how many unique values a field can take, and high-cardinality fields pose particular challenges for filtered vector search.

The difficulty is that high-cardinality filters are often highly selective, narrowing the candidate set to a tiny fraction of the data. Combined with a graph-based vector index, this can trigger the recall cliff: so few vectors qualify that the index struggles to navigate among them, hurting either speed or accuracy. Filtering by a single user’s records out of millions is a classic example.

Handling high-cardinality filters efficiently is a key test of a vector database. Techniques like bitmap filtering, payload indexes, and filter-aware traversal are designed precisely to keep these queries fast, which matters enormously in multi-tenant systems where almost every query filters on a high-cardinality tenant ID.