What is Temperature in AI?

Temperature is a sampling hyperparameter that controls how random or predictable an AI model's outputs are. Learn how it works, why it matters, and how to set it.

Temperature in AI is a hyperparameter that controls the randomness of a model's output by reshaping the probability distribution the model uses to choose its next token, word, or pixel. It is most commonly discussed in the context of large language models (LLMs) and other generative models, where it acts as a dial between predictability and creativity. Crank it down and the model tends to pick the most likely option every time; push it up and it is willing to take chances on less likely ones.

How Temperature works

Before generating each token, a model computes a raw score, called a logit, for every possibility in its vocabulary. These logits are converted into probabilities through the softmax function, and that is where temperature enters. Each logit is divided by the temperature value T before softmax is applied.

When T = 1, the distribution is unchanged. When T < 1, the probabilities diverge: already-likely tokens become even more likely, so sampling stays close to the model's "best guess." When T > 1, the distribution flattens and lower-probability tokens get a bigger share, so outputs become more diverse. For example, if a model thinks the next word is "the" with 60% confidence and "a" with 20%, temperature 0.2 might output "the" almost every time, while temperature 1.2 would output "a" roughly one in five tries.

Why it matters

Temperature is one of the simplest and most powerful levers for shaping model behavior without retraining. Low temperatures are favored for tasks that demand precision, such as code generation, factual question answering, and structured data extraction, where hallucinations are costly. Higher temperatures are useful for brainstorming, storytelling, and dialogue, where novelty and variety matter more than exactness.

It is also a key part of prompt engineering. Most LLM APIs, including those from OpenAI, Anthropic, and Google, expose temperature as a tunable parameter alongside related controls like top-p (nucleus sampling) and top-k. Because it directly affects user experience, it is one of the first settings developers adjust when moving a model from a demo into production.

Key temperature ranges and when to use them

  • 0.0 — Greedy decoding. The model always picks the highest-probability token. Maximum determinism; useful for reproducible code or math.
  • 0.0–0.3 — Low and focused. Good for translation, summarization, classification, and fact-based answering.
  • 0.4–0.7 — Balanced. A common default for general-purpose chat assistants.
  • 0.7–1.0 — More varied. Useful for creative writing, marketing copy, and ideation.
  • 1.0+ — Highly random. Outputs may become incoherent; rarely used outside research or experimental art.

Temperature is best understood as a knob, not a verdict. Pair it with top-p or top-k sampling, and adjust based on the specific task, model, and audience, since the same value can feel very different across applications.

You might also like

Related posts