A Large Language Model (LLM) is a type of artificial intelligence trained on enormous collections of text, such as books, articles, and websites, so it can understand, generate, and reason about human language. The “large” in the name refers both to the size of the training data and to the number of internal parameters, often billions or even hundreds of billions, that the model adjusts during training. Modern LLMs are what is called a foundation model: a general-purpose system that can be adapted to many downstream language tasks without being rebuilt from scratch.
How a Large Language Model works
Most LLMs are built on the transformer architecture, introduced in the 2017 paper “Attention Is All You Need.” A transformer reads a sequence of tokens (chunks of text) and uses a mechanism called self-attention to weigh which earlier tokens matter most when predicting the next one. During training, the model repeatedly guesses the next token in a passage, compares its guess to the actual token, and updates its parameters to reduce the error. After seeing enough examples, the model internalizes patterns of grammar, facts, reasoning styles, and even programming syntax.
At inference time, the LLM generates text one token at a time, sampling or selecting the most likely continuation based on the prompt and any system instructions. A simple example: given the prompt “The capital of France is,” the model assigns high probability to “Paris” and outputs it. The same mechanism, scaled up and trained on more diverse data, lets a single model write essays, translate languages, explain code, and hold a conversation.
Why it matters
LLMs are the engine behind most modern conversational AI, from customer support chatbots to coding assistants and search engines. They let software interact with people in natural language, automate drafting and summarization, and give non-technical users access to capabilities that previously required specialists. For businesses, LLMs reduce the cost of producing and analyzing text; for researchers, they provide a flexible substrate for studying language and reasoning. They also raise important questions about accuracy, bias, copyright, and energy use, because outputs reflect the data the model was trained on.
Key types and related concepts
- Base (pretrained) models: Raw models trained on broad text corpora, useful as a starting point for further fine-tuning.
- Instruct or chat-tuned models: Base models further trained with examples of instructions and dialogues so they follow user requests more reliably.
- Open-weight vs. proprietary LLMs: Open-weight models (e.g., Meta's Llama family, Mistral) release their parameters publicly; proprietary models (e.g., OpenAI's GPT series, Anthropic's Claude) are accessed through APIs.
- Multimodal models: LLMs extended to also process images, audio, or video alongside text.
- Small Language Models (SLMs): Compact models designed to run locally on devices or in private environments with lower cost.
An LLM is ultimately a statistical model of language, but because it has been scaled to billions of parameters and trained on a sizable fraction of the public web, it behaves like a remarkably versatile assistant. Understanding what an LLM is, and what it is not, is the first step toward using these tools effectively and critically.