Tokens as AI Currency

Tokens are the fundamental unit of exchange in LLM systems — understanding them as a budget changes how you design AI workflows.

Tokens are the atomic unit LLMs operate on — roughly 0.75 words per token. Every prompt you send and every response you receive is metered in tokens. They map directly to cost, latency, and the hard limit of what a model can “see” at once (context window).

Thinking of tokens as a currency budget shifts how you make architectural decisions.

Key concepts:

  • Context window — the total tokens available per request (input + output). Once full, older context is lost or must be compressed
  • Input vs output cost — input tokens are cheaper than output tokens; generation is the expensive operation
  • Token density — how much useful information fits per token (structured prompts > verbose prose)
  • Context as working memory — what’s in the window is what the model “knows” for that request; nothing else

Why it matters:

  • Long vibe coding sessions burn context fast — the model loses earlier decisions as the window fills
  • Multi-agent systems need bounded units of work or costs compound per step
  • Retrieval-augmented generation exists largely as a token-efficiency strategy: fetch only what’s needed

See also: