What is the difference between a foundation model and a large language model?

All large language models are foundation models, but not all foundation models are LLMs. A foundation model is any large model trained on broad data that can be adapted to many tasks, including image, audio, and protein models. A large language model is a foundation model specifically designed to process and generate text.

Who coined the term foundation model?

The term was introduced in 2021 by Stanford's Center for Research on Foundation Models (CRFM) in a paper led by Rishi Bommasani and colleagues. It described the shift from task-specific AI systems to a single general-purpose model reused across applications.

How are foundation models trained?

Most foundation models are pre-trained with self-supervised learning on huge unlabeled datasets, typically using the transformer architecture. The model learns by predicting missing or next parts of its input, such as the next word in a sentence, which lets it scale to internet-sized corpora without manual labeling.

Can foundation models be fine-tuned?

Yes. After pre-training, foundation models are commonly adapted through fine-tuning, instruction tuning, reinforcement learning from human feedback (RLHF), or prompt engineering. These steps specialize a general-purpose model for particular tasks, domains, or safety requirements.

What is a Foundation Model? Definition & Guide

A foundation model is a large machine learning model trained on massive, diverse datasets using methods such as self-supervised learning. After this broad pre-training, the same model can be adapted, or fine-tuned, to perform a wide variety of downstream tasks, from answering questions and translating languages to generating images and analyzing proteins. The concept was formalized in 2021 by Stanford's Center for Research on Foundation Models (CRFM), which coined the term to describe a new paradigm in AI.

How Foundation Models work

Foundation models are typically built using a neural network architecture, most often the transformer, and trained on hundreds of billions of words, images, or other data points scraped from the open web, books, code repositories, and licensed corpora. Training usually relies on self-supervised learning, where the model predicts missing or next pieces of its own input, removing the need for manually labeled examples at scale. The result is a model with broad statistical knowledge about language, code, images, or other modalities, which encodes general patterns rather than any single task.

Once pre-training is complete, the model becomes a foundation: developers adapt it to specific applications through techniques such as fine-tuning, prompt engineering, or retrieval-augmented generation. The same base model can therefore power a customer support chatbot, a medical record summarizer, and a code assistant, each built on top of shared capabilities rather than trained from scratch.

Why it matters

Foundation models have reshaped AI economics because a single pre-trained model can serve hundreds of downstream uses, dramatically reducing the cost and data required to build new applications. They power widely used systems such as large language models for text, diffusion models for image generation, and multimodal models that process text, images, and audio together. At the same time, their scale concentrates capabilities, risks, and biases, raising important questions about evaluation, safety, and governance.

Key types of Foundation Models

Large language models (LLMs): text-based models such as the GPT family, Claude, and Llama, trained on massive text corpora to generate and reason about language.
Diffusion models: image-generation models such as Stable Diffusion, trained to reverse a noise-adding process and synthesize images from text prompts.
Multimodal models: systems such as CLIP and GPT-4V that jointly process text, images, audio, or video within a single foundation.
Domain-specific foundations: models pre-trained on scientific literature, protein sequences, or code, then adapted for specialized tasks like drug discovery or software engineering.

By replacing the old paradigm of training a narrow model for each new problem with a single adaptable base, foundation models have become the default starting point for modern AI development.

What is Foundation Model?

How Foundation Models work

Why it matters

Key types of Foundation Models

Frequently Asked Questions