Cached input

Short answer

Cached input is the discounted price you pay for tokens that the API provider already saw recently in a previous request. On GPT-5.5 it costs $0.50 per 1M tokens versus $5.00 fresh — a 90% saving.

When you send a long system prompt or a large document to an LLM API, the provider normally charges full price for every token. With prompt caching, the provider stores the prefix and reuses it for a window of minutes or hours. The next request that reuses the same prefix pays the cached-input price instead of the fresh-input price.

For workloads that re-send the same instructions over and over (RAG, agents with long system prompts, multi-turn conversations), cached input is the single biggest lever on your bill.

Provider-by-provider cache pricing (2026-06-17)

  • OpenAI GPT-5.5 — cached input $0.50 / 1M vs $5.00 fresh (90% off).
  • Anthropic Claude Sonnet 4.6 — cached read $0.30 / 1M vs $3.00 fresh. Anthropic separately lists 5-minute and 1-hour cache write tiers.
  • Google Gemini 2.5 Pro — cached input $0.125 / 1M vs $1.25 fresh (≤200K context) with a separate storage fee.

When caching helps and when it doesn't

Caching only applies when the prefix matches. If your request is unique every time (one-shot prompts, per-document summarization), the cache doesn't help and you pay full price. Caching shines on multi-turn chat, agent runs that share a tool list, and repeated queries against the same knowledge base.

Related terms