📖

What is Token?

A token is a small chunk of text — a word, part of a word, or a single character — that a language model reads and produces as its basic unit of input and output. Text is broken into tokens through a process called tokenization, which lets the model handle language as a sequence of discrete, numbered pieces rather than raw characters.

A token is the smallest piece of text a language model actually works with. When you send a prompt to a model like GPT, Claude, or Llama, your text is first split into a sequence of tokens — typically whole words, common subwords, or single characters — and each token is then converted into a number the model can process. The model generates output the same way, predicting and emitting one token at a time until it decides to stop.

How tokens work

Tokens are produced by a tokenizer, a separate program that sits between your text and the model. The most common schemes are byte-pair encoding (BPE) and WordPiece, which start with individual characters and repeatedly merge the most frequent adjacent pairs into longer units. The result is a fixed vocabulary — often 30,000 to 200,000 entries — that balances short common words with reusable subword pieces. A frequent word like the usually becomes a single token, while a rare or made-up word like unbelievableness is split into several: un, believ, able, ness.

Because English averages around four characters per token, a rough rule of thumb is that 100 tokens ≈ 75 English words, though this varies by tokenizer and language. Pricing, context limits, and generation speed are all measured in tokens, not words or characters. A model with a 200,000-token context window can hold roughly the equivalent of a long novel plus several research papers in a single prompt.

Why it matters

Tokens determine three things every user cares about: cost, capacity, and behavior. API providers charge per million tokens, so a prompt that tokenizes inefficiently costs more than it should. Context windows — the maximum amount of text a model can consider at once — are counted in tokens, which is why very long documents must be chunked before being fed in. Behavior is affected too: a tokenizer that splits a word differently can change how a model reasons about it, and some languages tokenize into far more pieces per word than English, which inflates costs and shortens effective context for non-English users.

Key token concepts

  • Tokenization: the algorithm that splits text into tokens, usually via BPE, WordPiece, or Unigram.
  • Vocabulary: the fixed list of tokens a model knows, with a unique integer ID for each entry.
  • Special tokens: reserved symbols such as <BOS>, <EOS>, and padding markers that signal boundaries and structure rather than content.
  • Context window: the maximum number of tokens a model can process in a single request, including both input and generated output.
  • Token limits: hard caps imposed by providers on how many tokens a request may contain, often split into input and output limits.

For a deeper look at byte-pair encoding, Andrej Karpathy's walkthrough minbpe is a practical starting point, and the original Neural Machine Translation of Rare Words with Subword Units paper introduced the approach most modern tokenizers still build on.

Frequently Asked Questions

How many tokens are in a word?
It depends on the tokenizer, but English words are usually one or two tokens. A common short word like "the" is typically a single token, while longer or less common words are split into subword pieces — for example, "unbelievableness" might become four tokens. On average, English text runs about 0.75 tokens per word, or roughly 100 tokens per 75 words.
Why do AI models use tokens instead of words?
Words create problems for models: vocabularies balloon, rare words are unseen during training, and similar forms like "run," "running," and "ran" are treated as unrelated. Subword tokens give the model a fixed, manageable vocabulary while still letting it represent any word, including ones it has never seen, by combining familiar pieces.
Do tokens count toward the context window?
Yes. The context window is the total number of tokens the model can process in a single request, and it includes both the input you send and the output the model generates. If a model has a 100,000-token context window, your prompt and the model's reply together must fit within that budget.
Are tokens the same across different AI models?
No. Each model family uses its own tokenizer and vocabulary, so the same sentence can produce different token counts on different models. A prompt that fits comfortably in one model's context window may exceed another's, which is worth checking when switching between providers.