LLM (Large Language Model)

A large language model, or LLM, is a neural network with billions of parameters trained on vast amounts of text to predict the next piece of text in a sequence. From this simple objective emerges a broad ability to understand and generate language, answer questions, summarise, translate, write code, and follow instructions.

In a vector search context, the LLM is the generation component that works alongside retrieval. In a retrieval-augmented generation pipeline, the vector database finds relevant context and the LLM reads that context to produce a coherent, grounded answer. The two play distinct roles: the embedding model converts text into vectors for retrieval, while the LLM consumes retrieved text and writes the response.

LLMs have a fundamental limitation that vector databases address: they only know what was in their training data, which is fixed, possibly outdated, and excludes private information. By retrieving current, proprietary, or authoritative content at query time and supplying it as context, RAG lets an LLM answer accurately about things it never saw in training — without the cost of retraining the model.