Tokens as AI Currency
Tokens are the fundamental unit of exchange in LLM systems — understanding them as a budget changes how you design AI workflows.
Tokens are the atomic unit LLMs operate on — roughly 0.75 words per token. Every prompt you send and every response you receive is metered in tokens. They map directly to cost, latency, and the hard limit of what a model can “see” at once (context window).
Thinking of tokens as a currency budget shifts how you make architectural decisions.
Key concepts:
- Context window — the total tokens available per request (input + output). Once full, older context is lost or must be compressed
- Input vs output cost — input tokens are cheaper than output tokens; generation is the expensive operation
- Token density — how much useful information fits per token (structured prompts > verbose prose)
- Context as working memory — what’s in the window is what the model “knows” for that request; nothing else
Why it matters:
- Long vibe coding sessions burn context fast — the model loses earlier decisions as the window fills
- Multi-agent systems need bounded units of work or costs compound per step
- Retrieval-augmented generation exists largely as a token-efficiency strategy: fetch only what’s needed
See also: