Batch API

Short answer

A Batch API lets you submit a large set of requests together at a discount (typically 50% off) in exchange for slower turnaround — usually within 24 hours instead of seconds.

Both OpenAI and Anthropic offer batch endpoints that trade latency for price. The typical pattern: upload a JSONL file of requests, get results back within 24 hours, and pay roughly half the synchronous price.

  • OpenAI Batch: up to 24-hour turnaround, ~50% discount on input and output.
  • OpenAI Flex: same price as Batch but with lower latency (a few hours) and possible occasional 429s.
  • Anthropic Message Batches API: ~50% discount, 24-hour SLA.

Batch is great for bulk evaluation, dataset generation, and offline labeling — anything where you don't need a real-time response.

Related terms