Reference
Glossary
Authoritative definitions for every term in AI databases and vector search.
A
ACID Transactions
A set of database guarantees — Atomicity, Consistency, Isolation, Durability — ensuring operations either fully complete or fully roll back, rarely supported by pure vector databases.
ACORN
Approximate Nearest Neighbor Constraint-Optimized Retrieval Network — a graph-based ANN algorithm that integrates metadata filtering directly into graph traversal for faster filtered search.
Adaptive Filtered Traversal
A graph traversal strategy that dynamically adjusts its path through an ANN index based on how selective the active metadata filter is, avoiding performance cliffs under restrictive conditions.
Agentic RAG
A RAG architecture where an AI agent autonomously decides when, what, and how to retrieve context across multiple retrieval steps to answer complex queries.
Agentic Retrieval
A retrieval pattern where an AI agent autonomously decides when and what to retrieve, incorporating vector search as a reasoning step rather than a single fixed query.
AI Plumbing
The repetitive non-core infrastructure work in AI development — setting up embedding pipelines, syncing data, tuning indexes — that consumes engineering time without adding product value.
ANN (Approximate Nearest Neighbour)
A search algorithm that finds vectors approximately similar to a query by trading a small loss of accuracy for dramatically faster retrieval at scale.
B
Batch Indexing
The process of inserting or updating large volumes of vectors into a database index in bulk rather than one at a time.
BERT
A transformer-based language model by Google that produces contextual word embeddings, foundational to modern semantic search and NLP.
Bi-encoder
An embedding architecture that encodes query and document independently into vectors, enabling fast similarity search via pre-computed embeddings.
Bitmap Filtering
A filtering technique that represents matching vector IDs as compressed bitmaps, allowing extremely fast set operations to combine metadata conditions during search.
BM25
A probabilistic keyword ranking algorithm that scores documents by term frequency and document length, widely used in sparse lexical retrieval.
BYOC (Bring Your Own Cloud)
A deployment model where a managed vector database runs inside the customer's own cloud account, combining managed convenience with data residency and security control.
C
Centroid
The geometric centre of a cluster of vectors, used in IVF indexing as the representative point around which similar vectors are grouped.
Chunking
The process of splitting documents into smaller segments before embedding, a critical factor in RAG retrieval quality.
CLIP
A multimodal model by OpenAI that maps images and text into a shared vector space, enabling cross-modal similarity search.
Collection
A named group of vectors sharing the same schema and index configuration within a vector database, equivalent to a table in relational databases.
Collection-per-tenant
A multi-tenancy pattern where each tenant's vectors live in a dedicated collection within a shared database, balancing isolation against resource efficiency.
Context Engineering
The discipline of deciding what information to retrieve, summarise, and place in an LLM context window to maximise answer quality within token limits.
Context Window
The maximum number of tokens an LLM can process in a single input, limiting how much retrieved context can be injected in RAG.
Cosine Similarity
A distance metric measuring the cosine of the angle between two vectors, indicating semantic similarity independent of vector magnitude.
Cross-encoder
A re-ranking model that jointly processes query and document to produce a precise relevance score, used after first-stage ANN retrieval.
Curse of Dimensionality
The phenomenon where distance metrics lose discrimination power as the number of dimensions increases, complicating high-dimensional similarity search.
D
Data Tiering
Organising stored data across hot, warm, and cold storage layers based on access frequency, reducing cost by moving inactive tenant data off memory while keeping it retrievable.
Database-per-tenant
A multi-tenancy pattern giving each tenant a fully separate database instance, providing the strongest isolation at the cost of higher operational overhead.
Dense Retrieval
A retrieval approach that represents queries and documents as dense vectors and finds matches by computing similarity in embedding space.
Dense Vector
A vector in which most or all dimensions carry non-zero meaningful values, typical of neural network embedding outputs.
Developer Experience (DX)
The overall ease and quality of working with a vector database's APIs, SDKs, documentation, and tooling, a major factor in adoption and productivity.
Dimensionality
The number of elements in a vector, determining how much information it can encode and how much memory it requires to store.
DiskANN
A disk-based ANN algorithm by Microsoft that builds a graph index across data stored on SSD, enabling billion-scale search with low memory requirements.
Dot Product
A mathematical operation that multiplies corresponding vector elements and sums the results, used as a similarity metric sensitive to both direction and magnitude.
E
Edge Vector Search
Running vector similarity search on edge devices or on-premises hardware close to the data source, minimising latency by avoiding round-trips to centralised cloud infrastructure.
Embedded Vector Database
A vector database that runs in-process within the application rather than as a separate server, eliminating network overhead and simplifying deployment for local or edge use cases.
Embedding
A dense numerical vector generated by a machine learning model that encodes the semantic meaning of text, images, or other data.
Embedding Dimension
The number of values in a vector produced by an embedding model, typically ranging from 384 to 3072 depending on the model.
Embedding Pipeline
The end-to-end process of loading raw data, generating vector embeddings, and inserting them into a vector database ready for retrieval.
Episodic Memory
An agent memory type storing specific past events or interactions in sequence, allowing recall of what happened and when, as opposed to general factual knowledge.
Euclidean Distance
A distance metric measuring the straight-line distance between two points in vector space, also known as L2 distance.
F
FAISS
Facebook AI Similarity Search — an open-source library by Meta for efficient dense vector indexing and similarity search, widely used as a local embedding store.
Feature Vector
A numerical representation of an object's attributes used as input to machine learning models or for similarity comparison.
Filter-aware Indexing
An indexing strategy that encodes metadata attributes into the vector index structure itself, so filters do not degrade recall or force a fallback to brute-force search.
Filtered ANN Search
ANN search where metadata conditions are applied during or alongside vector traversal, restricting results to only those matching both similarity and filter criteria.
Filtered Vector Search
Similarity search constrained by metadata conditions such as category, date, or user ID, returning only vectors that satisfy both the filter and the similarity criteria.
Fine-tuning
The process of continuing to train a pre-trained model on domain-specific data to improve its performance on a target task.
Flat Index
A brute-force vector index that computes exact nearest neighbours by comparing the query to every stored vector, guaranteeing perfect recall.
G
GPU Acceleration
Using graphics processing units to speed up vector index building and similarity search, offering large throughput gains for billion-scale workloads.
Graph Index
A vector index that organises vectors as nodes in a graph connected to their nearest neighbours, enabling fast ANN traversal.
GraphRAG
A RAG variant that combines a knowledge graph with vector retrieval, using graph relationships to provide structured, multi-hop context to the LLM.
Grounding
The practice of anchoring LLM responses to retrieved factual sources, reducing hallucinations and improving answer accuracy.
H
Hallucination
The tendency of LLMs to generate plausible-sounding but factually incorrect information not supported by their training data or retrieved context.
Hamming Distance
A distance metric counting the number of positions at which two binary vectors differ, used in hash-based similarity search.
High-Cardinality Filter
A metadata filter on a field with many distinct values such as user ID or timestamp, which is harder to optimise than low-cardinality filters and can significantly impact ANN recall.
High-Dimensional Data
Data represented by vectors with hundreds or thousands of dimensions, typical of modern neural network embeddings.
HNSW
Hierarchical Navigable Small World — the dominant graph-based index algorithm for in-memory ANN search, offering high recall and fast query times.
HTAP (Hybrid Transactional/Analytical Processing)
A database architecture that handles both real-time transactional writes and analytical queries in the same system, enabling AI applications that need both operational data and analytical retrieval.
Hybrid Indexing
Combining multiple index structures — such as a vector graph index alongside an inverted keyword index — to serve different query types from the same data store.
Hybrid Search
A retrieval approach combining dense vector search with sparse keyword search such as BM25, capturing both semantic meaning and exact term matches in one query.
HyDE (Hypothetical Document Embeddings)
A RAG technique where the LLM first generates a hypothetical answer to embed, which is then used to retrieve real documents from the vector database.
I
In-Graph Filtering
A filtering strategy that applies metadata constraints during graph traversal of an ANN index, avoiding the recall loss of post-filtering and the candidate restriction of pre-filtering.
Index Warm-up
The process of loading a vector index from disk into memory before serving queries, required after restarts to restore full query performance.
Inline Filtering
Applying metadata filters during vector graph traversal rather than before or after the search, maintaining recall without restricting the initial candidate pool.
Integrated Vectorization
A database capability where embedding generation is handled automatically on data ingestion, removing the need to run a separate embedding model pipeline before storing vectors.
Intent-aware Search
Search that interprets the goal or intent behind a query rather than matching keywords or vectors alone, often combining retrieval with reasoning.
Inverted Index
A data structure mapping each term to the list of documents containing it, the foundation of keyword search and the sparse half of hybrid retrieval.
IVF (Inverted File Index)
A cluster-based ANN index that partitions vectors into Voronoi cells and searches only the nearest clusters at query time.
K
k-NN (k-Nearest Neighbours)
A search operation that returns the k vectors in a database most similar to a given query vector, ranked by distance.
Knowledge Graph
A database that stores explicit structured relationships between entities, complementary to vector databases which store implicit semantic relationships.
L
LangChain
An open-source framework for building LLM-powered applications that provides abstractions for chaining prompts, tools, memory, and retrieval components.
Latent Space
The high-dimensional mathematical space in which embedding vectors reside, where positions and distances encode semantic relationships.
LlamaIndex
A data framework for LLM applications that simplifies ingesting, indexing, and querying data sources for use in RAG and agent pipelines.
LLM (Large Language Model)
A large-scale neural network trained on text data, capable of understanding and generating natural language across a wide range of tasks.
Logic Layer
The component in an AI search system that combines semantic retrieval with structured business rules, routing logic, and metadata conditions to produce intent-aware results.
Logical Tenant Isolation
A multi-tenancy approach where tenants share the same infrastructure but their data is separated by namespaces, filters, or access controls at the application layer.
Long-term Memory (LTM)
External persistent storage that gives LLMs access to information beyond their context window, typically implemented using a vector database as the retrieval layer.
LSH (Locality Sensitive Hashing)
A hashing technique that maps similar vectors to the same hash buckets with high probability, enabling sub-linear approximate similarity search.
M
Managed Vector Database
A vector database offered as a hosted service where the provider handles infrastructure, scaling, and maintenance, letting teams focus on application logic.
Manhattan Distance
A distance metric summing the absolute differences between vector coordinates, also known as L1 distance or taxicab distance.
Memory Layer
An architectural component in AI systems that stores and retrieves past context, facts, or conversation history to extend an LLM beyond its context window.
Memory Management
The strategies an AI agent uses to store, retrieve, summarise, and expire information across sessions, balancing recall quality against storage and context limits.
Memory Policy
Rules governing what an AI agent stores in memory, what it discards, how it compresses old context, and how it prioritises retrieval over time.
Metadata Filtering
Restricting vector search results using structured scalar attributes alongside semantic similarity, combining exact conditions with approximate search.
MIPS (Maximum Inner Product Search)
A variant of nearest neighbour search that finds vectors maximising the dot product with a query rather than minimising distance.
MTEB (Massive Text Embedding Benchmark)
The standard benchmark for evaluating text embedding models across retrieval, classification, clustering, and semantic similarity tasks.
Multi-tenancy
An architecture where a single vector database instance serves multiple isolated tenants, each with their own data partitions and access controls.
Multimodal Embedding
An embedding that encodes multiple data types such as text, images, and audio in a single shared vector space for cross-modal retrieval.
Multimodal Search
Searching across different data types such as text, images, audio, and video within a single shared embedding space, so a query in one modality can retrieve results in another.
N
Named Vectors
A database feature that stores multiple distinct vector representations per object, each generated by a different model, enabling multi-purpose search on the same data.
Namespace
An isolated partition within a vector database that separates collections of vectors, commonly used for multi-tenancy and logical data separation.
Namespace-per-tenant
A multi-tenancy pattern where each tenant occupies a separate namespace inside a single shared index, offering lightweight logical isolation at scale.
Noisy Neighbour
A multi-tenancy performance problem where one tenant's heavy workload degrades query latency for other tenants sharing the same index or infrastructure.
P
Partition Key
A field whose value determines which partition or shard a vector is stored in, used to isolate tenants or co-locate related data for efficient filtered search.
Payload Filtering
Filtering on structured metadata stored alongside vectors — the term commonly used in systems where metadata is stored as a JSON payload attached to each vector.
Payload Index
A secondary index built over the metadata (payload) attached to vectors, enabling fast metadata filtering during similarity search.
Persistent Memory
Storage that survives application restarts and sessions, allowing an AI system to retain knowledge and context durably over time rather than only within a single run.
pgvector
An open-source PostgreSQL extension that adds vector storage and similarity search to an existing Postgres database, popular for teams already running Postgres.
Physical Tenant Isolation
A multi-tenancy approach where each tenant gets dedicated infrastructure resources such as separate index shards or clusters, guaranteeing performance and data separation.
Post-filtering
Applying metadata conditions after vector search to filter an already-retrieved candidate set, preserving recall at the cost of fewer final results.
Pre-filtering
Applying metadata conditions before vector search to restrict the candidate set, trading recall for speed on highly selective filters.
Precision
A search quality metric measuring the fraction of returned results that are genuinely relevant to the query.
Product Quantisation (PQ)
A vector compression method that splits high-dimensional vectors into subspaces and quantises each independently, achieving very high compression ratios.
Prompt Engineering
The practice of designing and refining LLM input prompts to reliably produce accurate, relevant, and well-formatted outputs.
Q
Quantisation
A compression technique that reduces the numerical precision of stored vectors to decrease memory footprint while preserving approximate similarity.
Query Rewriting
Transforming a user query before embedding it for retrieval, often using an LLM to expand, clarify, or decompose the original query.
R
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in context retrieved from a vector database at query time, reducing hallucinations without retraining.
Re-ranking
A post-retrieval step that reorders an initial candidate set using a more accurate but computationally expensive model such as a cross-encoder.
Real-time Indexing
The ability to insert or update vectors in a database index and have them immediately queryable, without requiring a full index rebuild.
Real-time Ingestion
The continuous insertion of new vectors into a live index as data arrives, making them immediately searchable without a batch rebuild.
Recall Cliff
A sharp drop in result quality or spike in latency that occurs when a restrictive metadata filter leaves too few candidates for an ANN index to navigate effectively.
Recall@K
A search quality metric measuring what fraction of the true K nearest neighbours appear in the top-K results of an ANN search.
Reciprocal Rank Fusion (RRF)
A score fusion algorithm that combines ranked lists from multiple retrieval systems by summing reciprocals of each result's rank.
Relative Score Fusion
A score fusion method that normalises and weights the raw scores from each retrieval method before combining them, as an alternative to rank-based fusion.
Replication
Maintaining copies of a vector index across multiple nodes to improve query throughput and provide fault tolerance.
Representation Learning
The field of machine learning focused on training models to automatically learn useful vector representations of raw data.
Retrieval Layer
The component in an AI application stack responsible for finding and returning relevant stored content in response to a query, typically backed by a vector database.
Retrieval Pipeline
The end-to-end system for fetching relevant content, typically comprising embedding, indexing, ANN search, filtering, and optional re-ranking stages.
Roaring Bitmaps
A compressed bitmap data structure used in vector databases to represent sets of matching vector IDs for metadata filters, enabling extremely fast set intersection during filtered search.
Row-Level Security (RLS)
A database mechanism that restricts which rows a query can access based on the requesting user, used to enforce tenant isolation in shared-table vector stores.
S
Scalar Quantisation (SQ)
A compression method that reduces vector component precision from 32-bit floats to 8-bit integers, achieving a 4× memory reduction with minimal recall loss.
Score Fusion
The process of combining relevance scores from multiple retrieval methods into a single ranking, used in hybrid search to merge vector and keyword results.
Self-hosting
Running a vector database on your own infrastructure rather than a managed service, giving full control and no per-query cost but requiring operational expertise.
Semantic Caching
Caching LLM or retrieval responses keyed by query meaning rather than exact text, so semantically similar queries reuse a cached answer and avoid recomputation.
Semantic Memory
An agent memory type storing general facts and concepts independent of when they were learned, typically backed by a vector database for similarity retrieval.
Semantic Search
Search that retrieves results based on meaning and conceptual similarity rather than exact keyword or token matching.
Semantic Similarity
A measure of how alike two pieces of content are in meaning, computed as the distance between their embedding vectors.
Sentence Transformer
A class of transformer models that produce fixed-length sentence embeddings optimised for semantic similarity and retrieval tasks.
Serverless Vector Database
A vector database deployment model where infrastructure scales automatically and users are charged per query or storage unit rather than for provisioned capacity.
Sharding
Distributing a vector index across multiple nodes to scale beyond single-machine capacity, with queries executed across shards in parallel.
Similarity Search
The operation of finding the stored items most similar to a query item, measured by a distance metric in vector space.
Sparse Retrieval
A retrieval approach using sparse keyword-based representations such as BM25 or TF-IDF, excelling at exact term matching and rare vocabulary.
Sparse Vector
A vector in which most values are zero, typical of keyword-based representations like TF-IDF used in lexical retrieval.
Stateless
The property of LLMs that they retain no memory between separate invocations — each call starts fresh unless context is explicitly provided.
Structured Filtering
Applying conditions on structured scalar fields such as numbers, dates, categories, and booleans alongside vector similarity to narrow results to relevant records.
T
Tenant Isolation
The separation of one customer's data and query workload from another in a multi-tenant vector database, achieved logically via namespaces or physically via dedicated resources.
Tenant Offloading
Moving an inactive tenant's data from active memory to cheaper cold storage and reactivating it on demand, part of a formal tenant lifecycle of hot, cold, and frozen states.
TF-IDF
A statistical measure weighting terms by how frequently they appear in a document relative to how common they are across all documents.
Token
The basic unit of text processed by language models, typically a word or subword piece produced by a tokeniser.
Tokenisation
The process of splitting text into tokens before it is processed by a language model or embedding model.
Top-K
The K most similar results returned by a vector search query, ranked by their similarity score to the query vector.
Two-Stage Retrieval
A retrieval architecture using a fast first stage (ANN search) to produce candidates, followed by a slower, more accurate second stage (re-ranking).
V
Vector
An ordered array of numbers representing a data point in high-dimensional space, encoding its semantic or spatial properties.
Vector Database
A database management system purpose-built to store, index, and query high-dimensional vectors at scale using similarity search.
Vector Index
A data structure that organises stored vectors geometrically to enable fast approximate or exact similarity search without scanning every entry.
Vector Space
The high-dimensional mathematical space in which embedding vectors reside, where geometric proximity encodes semantic similarity.
Vector Store
A lightweight term for the storage layer in a vector database or RAG system, sometimes used interchangeably with vector database.
Vectorizer
A module or integration that automatically converts raw data into embeddings during ingestion, eliminating the need to manage embedding generation separately.
Voronoi Cell
A region of space containing all points closer to a particular centroid than any other, used in IVF indexing to partition the vector space.
Z
Zero-Ops
A property of fully managed services that require no infrastructure provisioning, scaling, or maintenance from the user, reducing operational burden to near zero.
Zero-shot
The ability of a model to perform a task it was not explicitly trained on, relying on generalisation from pre-training rather than task-specific examples.