Cached input
Cached input is the discounted price you pay for tokens that the API provider already saw recently in a previous request. On GPT-5.5 it costs $0.50 per 1M tokens versus $5.00 fresh — a 90% saving.
When you send a long system prompt or a large document to an LLM API, the provider normally charges full price for every token. With prompt caching, the provider stores the prefix and reuses it for a window of minutes or hours. The next request that reuses the same prefix pays the cached-input price instead of the fresh-input price.
For workloads that re-send the same instructions over and over (RAG, agents with long system prompts, multi-turn conversations), cached input is the single biggest lever on your bill.
Provider-by-provider cache pricing (2026-06-17)
- OpenAI GPT-5.5 — cached input $0.50 / 1M vs $5.00 fresh (90% off).
- Anthropic Claude Sonnet 4.6 — cached read $0.30 / 1M vs $3.00 fresh. Anthropic separately lists 5-minute and 1-hour cache write tiers.
- Google Gemini 2.5 Pro — cached input $0.125 / 1M vs $1.25 fresh (≤200K context) with a separate storage fee.
When caching helps and when it doesn't
Caching only applies when the prefix matches. If your request is unique every time (one-shot prompts, per-document summarization), the cache doesn't help and you pay full price. Caching shines on multi-turn chat, agent runs that share a tool list, and repeated queries against the same knowledge base.
Related terms
Prompt caching
Prompt caching lets an API provider reuse a previously seen input prefix at a discounted rate instead of reprocessing ev…
Batch API
A Batch API lets you submit a large set of requests together at a discount (typically 50% off) in exchange for slower tu…
OpenRouter
OpenRouter is an aggregator that routes a single API key to many model providers, with automatic fallback. Pricing usual…