What is a Token (in LLMs)?
The smallest unit an LLM processes — not a word, but a fragment. Tokens determine API cost and context limits.
A token is the smallest unit an LLM processes. A token is NOT the same as a word — it’s a fragment of a word, sometimes a whole short word, sometimes part of one.
Examples
"Tokenization is fun!" → 5 tokens:
["Token", "ization", " is", " fun", "!"]
For non-English languages with diacritics or non-Latin scripts (Vietnamese, Thai, Chinese, Arabic), the same idea takes more tokens — often 1.5-3× more than English of equivalent meaning.
Why tokens matter
1. They drive cost
Most LLM APIs price per token:
- Claude Sonnet: ~$3 / 1M input tokens, $15 / 1M output
- GPT-4o: ~$2.5 / 1M input, $10 / 1M output
A 1000-word prompt is roughly 1300 tokens. Costs add up fast across many calls.
2. They cap your context window
Every model has a maximum tokens per call (input + output combined):
- GPT-4o: 128k tokens (~96k English words)
- Claude 4.7: 200k - 1M tokens
- Gemini 2.5: 2M tokens
Over the limit → request fails. You’ll need RAG or summarization.
3. They determine speed
More tokens → slower response. Streaming returns one token at a time.
How to count tokens
- OpenAI: tiktoken (https://platform.openai.com/tokenizer)
- Anthropic: API
count_tokens - Quick rule: 1 token ≈ 0.75 English words
How to save tokens
- Trim prompt fluff
- Use prompt caching if available
- Ask for brevity (“Reply in at most 3 sentences”)
- Use a smaller model for simple tasks (Claude Haiku instead of Sonnet)