Input tokens vs output tokens
Short answer
Input tokens are the tokens in your prompt (system + user message + retrieved context). Output tokens are the tokens the model generates. Providers almost always charge more for output than input because generation is more expensive.
The input/output split is the single most important concept for understanding LLM pricing. A typical frontier model prices input at ~5–10× less than output per million tokens:
- GPT-5.5: $5.00 in / $30.00 out per 1M (6× output premium).
- Claude Opus 4.8: $5.00 in / $25.00 out per 1M (5× output premium).
- Gemini 2.5 Pro: $1.25 in / $10.00 out per 1M (8× output premium).
This means workloads that produce long responses (long-form writing, agent loops, code generation) should be modeled with output volume in mind, not just input.