Input tokens vs output tokens

Short answer

Input tokens are the tokens in your prompt (system + user message + retrieved context). Output tokens are the tokens the model generates. Providers almost always charge more for output than input because generation is more expensive.

The input/output split is the single most important concept for understanding LLM pricing. A typical frontier model prices input at ~5–10× less than output per million tokens:

GPT-5.5: $5.00 in / $30.00 out per 1M (6× output premium).
Claude Opus 4.8: $5.00 in / $25.00 out per 1M (5× output premium).
Gemini 2.5 Pro: $1.25 in / $10.00 out per 1M (8× output premium).

This means workloads that produce long responses (long-form writing, agent loops, code generation) should be modeled with output volume in mind, not just input.

Related terms

Context window

The context window is the maximum number of tokens (input + output) a model can process in a single request. GPT-5.5 and…

Cached input

Cached input is the discounted price you pay for tokens that the API provider already saw recently in a previous request…