Token

A token is the basic unit of text that language models and many embedding models operate on. Rather than processing text letter by letter or strictly word by word, models break it into tokens — typically subword pieces averaging a few characters. A common word may be a single token, while a long or unusual word may be split into several.

Token counts matter throughout a vector search system. Embedding models impose a maximum number of input tokens, beyond which text is truncated, which influences how documents must be chunked. Language models measure their context window in tokens, capping how much retrieved material can be supplied. And because API pricing is usually per token, token counts directly drive cost.

Tokenisation is also language-dependent: text in some languages requires more tokens per word than English, making multilingual applications more expensive and their chunking more delicate. Understanding tokens helps explain many practical constraints in building retrieval systems, from how big a chunk should be to how much context fits in a prompt and what a query will cost.