Sổ Tay AI
ky-thuat Beginner

What is a Token (in LLMs)?

The smallest unit an LLM processes — not a word, but a fragment. Tokens determine API cost and context limits.

Updated: May 2, 2026 · 1 min read

A token is the smallest unit an LLM processes. A token is NOT the same as a word — it’s a fragment of a word, sometimes a whole short word, sometimes part of one.

Examples

"Tokenization is fun!" → 5 tokens:

["Token", "ization", " is", " fun", "!"]

For non-English languages with diacritics or non-Latin scripts (Vietnamese, Thai, Chinese, Arabic), the same idea takes more tokens — often 1.5-3× more than English of equivalent meaning.

Why tokens matter

1. They drive cost

Most LLM APIs price per token:

  • Claude Sonnet: ~$3 / 1M input tokens, $15 / 1M output
  • GPT-4o: ~$2.5 / 1M input, $10 / 1M output

A 1000-word prompt is roughly 1300 tokens. Costs add up fast across many calls.

2. They cap your context window

Every model has a maximum tokens per call (input + output combined):

  • GPT-4o: 128k tokens (~96k English words)
  • Claude 4.7: 200k - 1M tokens
  • Gemini 2.5: 2M tokens

Over the limit → request fails. You’ll need RAG or summarization.

3. They determine speed

More tokens → slower response. Streaming returns one token at a time.

How to count tokens

How to save tokens

  • Trim prompt fluff
  • Use prompt caching if available
  • Ask for brevity (“Reply in at most 3 sentences”)
  • Use a smaller model for simple tasks (Claude Haiku instead of Sonnet)
Tags
#token#llm#basics