A foundation model is a large machine learning model trained on massive, diverse datasets using methods such as self-supervised learning. After this broad pre-training, the same model can be adapted, or fine-tuned, to perform a wide variety of downstream tasks, from answering questions and translating languages to generating images and analyzing proteins. The concept was formalized in 2021 by Stanford's Center for Research on Foundation Models (CRFM), which coined the term to describe a new paradigm in AI.
How Foundation Models work
Foundation models are typically built using a neural network architecture, most often the transformer, and trained on hundreds of billions of words, images, or other data points scraped from the open web, books, code repositories, and licensed corpora. Training usually relies on self-supervised learning, where the model predicts missing or next pieces of its own input, removing the need for manually labeled examples at scale. The result is a model with broad statistical knowledge about language, code, images, or other modalities, which encodes general patterns rather than any single task.
Once pre-training is complete, the model becomes a foundation: developers adapt it to specific applications through techniques such as fine-tuning, prompt engineering, or retrieval-augmented generation. The same base model can therefore power a customer support chatbot, a medical record summarizer, and a code assistant, each built on top of shared capabilities rather than trained from scratch.
Why it matters
Foundation models have reshaped AI economics because a single pre-trained model can serve hundreds of downstream uses, dramatically reducing the cost and data required to build new applications. They power widely used systems such as large language models for text, diffusion models for image generation, and multimodal models that process text, images, and audio together. At the same time, their scale concentrates capabilities, risks, and biases, raising important questions about evaluation, safety, and governance.
Key types of Foundation Models
- Large language models (LLMs): text-based models such as the GPT family, Claude, and Llama, trained on massive text corpora to generate and reason about language.
- Diffusion models: image-generation models such as Stable Diffusion, trained to reverse a noise-adding process and synthesize images from text prompts.
- Multimodal models: systems such as CLIP and GPT-4V that jointly process text, images, audio, or video within a single foundation.
- Domain-specific foundations: models pre-trained on scientific literature, protein sequences, or code, then adapted for specialized tasks like drug discovery or software engineering.
By replacing the old paradigm of training a narrow model for each new problem with a single adaptable base, foundation models have become the default starting point for modern AI development.