Skip to content

Reference

Glossary

Authoritative definitions for every term in AI databases and vector search.

C

Centroid

The geometric centre of a cluster of vectors, used in IVF indexing as the representative point around which similar vectors are grouped.

Chunking

The process of splitting documents into smaller segments before embedding, a critical factor in RAG retrieval quality.

CLIP

A multimodal model by OpenAI that maps images and text into a shared vector space, enabling cross-modal similarity search.

Collection

A named group of vectors sharing the same schema and index configuration within a vector database, equivalent to a table in relational databases.

Collection-per-tenant

A multi-tenancy pattern where each tenant's vectors live in a dedicated collection within a shared database, balancing isolation against resource efficiency.

Context Engineering

The discipline of deciding what information to retrieve, summarise, and place in an LLM context window to maximise answer quality within token limits.

Context Window

The maximum number of tokens an LLM can process in a single input, limiting how much retrieved context can be injected in RAG.

Cosine Similarity

A distance metric measuring the cosine of the angle between two vectors, indicating semantic similarity independent of vector magnitude.

Cross-encoder

A re-ranking model that jointly processes query and document to produce a precise relevance score, used after first-stage ANN retrieval.

Curse of Dimensionality

The phenomenon where distance metrics lose discrimination power as the number of dimensions increases, complicating high-dimensional similarity search.

H

Hallucination

The tendency of LLMs to generate plausible-sounding but factually incorrect information not supported by their training data or retrieved context.

Hamming Distance

A distance metric counting the number of positions at which two binary vectors differ, used in hash-based similarity search.

High-Cardinality Filter

A metadata filter on a field with many distinct values such as user ID or timestamp, which is harder to optimise than low-cardinality filters and can significantly impact ANN recall.

High-Dimensional Data

Data represented by vectors with hundreds or thousands of dimensions, typical of modern neural network embeddings.

HNSW

Hierarchical Navigable Small World — the dominant graph-based index algorithm for in-memory ANN search, offering high recall and fast query times.

HTAP (Hybrid Transactional/Analytical Processing)

A database architecture that handles both real-time transactional writes and analytical queries in the same system, enabling AI applications that need both operational data and analytical retrieval.

Hybrid Indexing

Combining multiple index structures — such as a vector graph index alongside an inverted keyword index — to serve different query types from the same data store.

Hybrid Search

A retrieval approach combining dense vector search with sparse keyword search such as BM25, capturing both semantic meaning and exact term matches in one query.

HyDE (Hypothetical Document Embeddings)

A RAG technique where the LLM first generates a hypothetical answer to embed, which is then used to retrieve real documents from the vector database.

M

Managed Vector Database

A vector database offered as a hosted service where the provider handles infrastructure, scaling, and maintenance, letting teams focus on application logic.

Manhattan Distance

A distance metric summing the absolute differences between vector coordinates, also known as L1 distance or taxicab distance.

Memory Layer

An architectural component in AI systems that stores and retrieves past context, facts, or conversation history to extend an LLM beyond its context window.

Memory Management

The strategies an AI agent uses to store, retrieve, summarise, and expire information across sessions, balancing recall quality against storage and context limits.

Memory Policy

Rules governing what an AI agent stores in memory, what it discards, how it compresses old context, and how it prioritises retrieval over time.

Metadata Filtering

Restricting vector search results using structured scalar attributes alongside semantic similarity, combining exact conditions with approximate search.

MIPS (Maximum Inner Product Search)

A variant of nearest neighbour search that finds vectors maximising the dot product with a query rather than minimising distance.

MTEB (Massive Text Embedding Benchmark)

The standard benchmark for evaluating text embedding models across retrieval, classification, clustering, and semantic similarity tasks.

Multi-tenancy

An architecture where a single vector database instance serves multiple isolated tenants, each with their own data partitions and access controls.

Multimodal Embedding

An embedding that encodes multiple data types such as text, images, and audio in a single shared vector space for cross-modal retrieval.

Multimodal Search

Searching across different data types such as text, images, audio, and video within a single shared embedding space, so a query in one modality can retrieve results in another.

P

Partition Key

A field whose value determines which partition or shard a vector is stored in, used to isolate tenants or co-locate related data for efficient filtered search.

Payload Filtering

Filtering on structured metadata stored alongside vectors — the term commonly used in systems where metadata is stored as a JSON payload attached to each vector.

Payload Index

A secondary index built over the metadata (payload) attached to vectors, enabling fast metadata filtering during similarity search.

Persistent Memory

Storage that survives application restarts and sessions, allowing an AI system to retain knowledge and context durably over time rather than only within a single run.

pgvector

An open-source PostgreSQL extension that adds vector storage and similarity search to an existing Postgres database, popular for teams already running Postgres.

Physical Tenant Isolation

A multi-tenancy approach where each tenant gets dedicated infrastructure resources such as separate index shards or clusters, guaranteeing performance and data separation.

Post-filtering

Applying metadata conditions after vector search to filter an already-retrieved candidate set, preserving recall at the cost of fewer final results.

Pre-filtering

Applying metadata conditions before vector search to restrict the candidate set, trading recall for speed on highly selective filters.

Precision

A search quality metric measuring the fraction of returned results that are genuinely relevant to the query.

Product Quantisation (PQ)

A vector compression method that splits high-dimensional vectors into subspaces and quantises each independently, achieving very high compression ratios.

Prompt Engineering

The practice of designing and refining LLM input prompts to reliably produce accurate, relevant, and well-formatted outputs.

R

RAG (Retrieval-Augmented Generation)

A technique that grounds LLM responses in context retrieved from a vector database at query time, reducing hallucinations without retraining.

Re-ranking

A post-retrieval step that reorders an initial candidate set using a more accurate but computationally expensive model such as a cross-encoder.

Real-time Indexing

The ability to insert or update vectors in a database index and have them immediately queryable, without requiring a full index rebuild.

Real-time Ingestion

The continuous insertion of new vectors into a live index as data arrives, making them immediately searchable without a batch rebuild.

Recall Cliff

A sharp drop in result quality or spike in latency that occurs when a restrictive metadata filter leaves too few candidates for an ANN index to navigate effectively.

Recall@K

A search quality metric measuring what fraction of the true K nearest neighbours appear in the top-K results of an ANN search.

Reciprocal Rank Fusion (RRF)

A score fusion algorithm that combines ranked lists from multiple retrieval systems by summing reciprocals of each result's rank.

Relative Score Fusion

A score fusion method that normalises and weights the raw scores from each retrieval method before combining them, as an alternative to rank-based fusion.

Replication

Maintaining copies of a vector index across multiple nodes to improve query throughput and provide fault tolerance.

Representation Learning

The field of machine learning focused on training models to automatically learn useful vector representations of raw data.

Retrieval Layer

The component in an AI application stack responsible for finding and returning relevant stored content in response to a query, typically backed by a vector database.

Retrieval Pipeline

The end-to-end system for fetching relevant content, typically comprising embedding, indexing, ANN search, filtering, and optional re-ranking stages.

Roaring Bitmaps

A compressed bitmap data structure used in vector databases to represent sets of matching vector IDs for metadata filters, enabling extremely fast set intersection during filtered search.

Row-Level Security (RLS)

A database mechanism that restricts which rows a query can access based on the requesting user, used to enforce tenant isolation in shared-table vector stores.

S

Scalar Quantisation (SQ)

A compression method that reduces vector component precision from 32-bit floats to 8-bit integers, achieving a 4× memory reduction with minimal recall loss.

Score Fusion

The process of combining relevance scores from multiple retrieval methods into a single ranking, used in hybrid search to merge vector and keyword results.

Self-hosting

Running a vector database on your own infrastructure rather than a managed service, giving full control and no per-query cost but requiring operational expertise.

Semantic Caching

Caching LLM or retrieval responses keyed by query meaning rather than exact text, so semantically similar queries reuse a cached answer and avoid recomputation.

Semantic Memory

An agent memory type storing general facts and concepts independent of when they were learned, typically backed by a vector database for similarity retrieval.

Semantic Search

Search that retrieves results based on meaning and conceptual similarity rather than exact keyword or token matching.

Semantic Similarity

A measure of how alike two pieces of content are in meaning, computed as the distance between their embedding vectors.

Sentence Transformer

A class of transformer models that produce fixed-length sentence embeddings optimised for semantic similarity and retrieval tasks.

Serverless Vector Database

A vector database deployment model where infrastructure scales automatically and users are charged per query or storage unit rather than for provisioned capacity.

Sharding

Distributing a vector index across multiple nodes to scale beyond single-machine capacity, with queries executed across shards in parallel.

Similarity Search

The operation of finding the stored items most similar to a query item, measured by a distance metric in vector space.

Sparse Retrieval

A retrieval approach using sparse keyword-based representations such as BM25 or TF-IDF, excelling at exact term matching and rare vocabulary.

Sparse Vector

A vector in which most values are zero, typical of keyword-based representations like TF-IDF used in lexical retrieval.

Stateless

The property of LLMs that they retain no memory between separate invocations — each call starts fresh unless context is explicitly provided.

Structured Filtering

Applying conditions on structured scalar fields such as numbers, dates, categories, and booleans alongside vector similarity to narrow results to relevant records.