Chunking

Chunking is the process of splitting source documents into smaller pieces before turning them into embeddings. It is a foundational and often underestimated step in any retrieval-augmented generation pipeline — and the chunking strategy frequently has more impact on retrieval quality than the choice of embedding model itself.

Chunking is necessary for two reasons. Embedding models have a maximum input length, so long documents must be divided to fit. More importantly, embedding a whole document into a single vector blurs its meaning: a page covering five topics produces a muddy average that matches no specific query well. Smaller, focused chunks each carry a clear semantic signal that retrieves precisely.

Common strategies include fixed-size chunks with overlap, sentence- or paragraph-aware splitting, and semantic chunking that breaks where the topic shifts. The key trade-off is granularity: small chunks give precise matches but little surrounding context, while large chunks provide rich context but dilute the signal. Overlap between adjacent chunks helps ensure that information spanning a boundary is not lost.