Thinking tokens

Short answer

Thinking tokens are the hidden reasoning steps a reasoning model generates internally before its visible reply. On most providers they are billed as output tokens even though the user never sees them directly.

Reasoning models (OpenAI's o-series / GPT-5.5 Thinking, Anthropic's extended thinking, Gemini Thinking, DeepSeek R1) break their work into two phases: internal chain-of-thought and the final user-facing response. The internal chain-of-thought is paid for but hidden.

This can drastically change the economics of "reasoning" workloads. A 200-word visible answer might cost the equivalent of a 1,500-word generation because of thinking tokens. Always check whether the provider bills thinking as output (most do) and account for it in your model.

Related terms