What is a Token in AI and Language Models?

Token in AI: the basic unit a language model reads and writes. Learn how tokenization works, why tokens matter for cost and context, and how they shape model behavior.

HyperStore · Published on 2026-06-20

#context window #language models #LLM #NLP basics #tokenization #tokens

A token is the smallest piece of text a language model actually works with. When you send a prompt to a model like GPT, Claude, or Llama, your text is first split into a sequence of tokens — typically whole words, common subwords, or single characters — and each token is then converted into a number the model can process. The model generates output the same way, predicting and emitting one token at a time until it decides to stop.

How tokens work

Tokens are produced by a tokenizer, a separate program that sits between your text and the model. The most common schemes are byte-pair encoding (BPE) and WordPiece, which start with individual characters and repeatedly merge the most frequent adjacent pairs into longer units. The result is a fixed vocabulary — often 30,000 to 200,000 entries — that balances short common words with reusable subword pieces. A frequent word like the usually becomes a single token, while a rare or made-up word like unbelievableness is split into several: un, believ, able, ness.

Because English averages around four characters per token, a rough rule of thumb is that 100 tokens ≈ 75 English words, though this varies by tokenizer and language. Pricing, context limits, and generation speed are all measured in tokens, not words or characters. A model with a 200,000-token context window can hold roughly the equivalent of a long novel plus several research papers in a single prompt.

Why it matters

Tokens determine three things every user cares about: cost, capacity, and behavior. API providers charge per million tokens, so a prompt that tokenizes inefficiently costs more than it should. Context windows — the maximum amount of text a model can consider at once — are counted in tokens, which is why very long documents must be chunked before being fed in. Behavior is affected too: a tokenizer that splits a word differently can change how a model reasons about it, and some languages tokenize into far more pieces per word than English, which inflates costs and shortens effective context for non-English users.

Key token concepts

Tokenization: the algorithm that splits text into tokens, usually via BPE, WordPiece, or Unigram.
Vocabulary: the fixed list of tokens a model knows, with a unique integer ID for each entry.
Special tokens: reserved symbols such as <BOS>, <EOS>, and padding markers that signal boundaries and structure rather than content.
Context window: the maximum number of tokens a model can process in a single request, including both input and generated output.
Token limits: hard caps imposed by providers on how many tokens a request may contain, often split into input and output limits.

For a deeper look at byte-pair encoding, Andrej Karpathy's walkthrough minbpe is a practical starting point, and the original Neural Machine Translation of Rare Words with Subword Units paper introduced the approach most modern tokenizers still build on.

How tokens work

Why it matters

Key token concepts

You might also like

What is a Neural Network?

What is a Transformer?

What is a Vector Database?

Related posts

What is a Transformer?

What is Retrieval-Augmented Generation (RAG)?

What is Prompt Engineering?