AI pricing glossary

Short, plain definitions for the vocabulary you'll meet on AI pricing pages — what tokens are, how caching works, what context window means, and why output costs more than input.

Each entry links to its full explainer with provider-specific numbers.

Cached input

Cached input is the discounted price you pay for tokens that the API provider already saw recently in a previous request. On GPT-5.5 it costs $0.50 per 1M token…

Prompt caching

Prompt caching lets an API provider reuse a previously seen input prefix at a discounted rate instead of reprocessing every token. GPT-5.5 cached input is $0.50…

Batch API

A Batch API lets you submit a large set of requests together at a discount (typically 50% off) in exchange for slower turnaround — usually within 24 hours inste…

Context window

The context window is the maximum number of tokens (input + output) a model can process in a single request. GPT-5.5 and Claude Opus 4.8 both have 1M-token cont…

Max output tokens

Max output tokens is the upper limit on how many tokens a model can generate in a single response. It's usually much smaller than the context window — GPT-5.5 c…

OpenRouter

OpenRouter is an aggregator that routes a single API key to many model providers, with automatic fallback. Pricing usually matches the provider's official list …

AWS Bedrock

AWS Bedrock is Amazon's enterprise channel for accessing models from Anthropic, Meta, Mistral, Cohere, and others. Pricing varies by model, region, and commitme…

Azure OpenAI

Azure OpenAI delivers OpenAI models (GPT-5.5, GPT-5.4, embeddings, etc.) through Microsoft's enterprise cloud — same token prices as OpenAI direct, but with Azu…

Input tokens vs output tokens

Input tokens are the tokens in your prompt (system + user message + retrieved context). Output tokens are the tokens the model generates. Providers almost alway…

Thinking tokens

Thinking tokens are the hidden reasoning steps a reasoning model generates internally before its visible reply. On most providers they are billed as output toke…