Prompt Caching Optimizer FAQs

Question 1

How do I choose between a 5-minute and 1-hour TTL?

Accepted Answer

Use the 5-minute TTL for most interactive sessions to minimize write costs. Use the 1-hour TTL only when a prompt is reused more than 10 times per hour to justify the higher 2x write cost.

Question 2

Which models support native prompt caching?

Accepted Answer

This skill supports Anthropic's Claude 3.5, 3.7, 4, and 4.5 models (Opus, Sonnet, Haiku), as well as OpenAI's GPT-5 and o3 series which feature automatic caching.

Question 3

How much can I save using the prompt-caching skill?

Accepted Answer

Cache reads typically offer a 90% discount compared to base input prices. While there is a small surcharge for the initial cache write (1.25x to 2x), applications usually break even after just 2 to 8 repeated requests.

Question 4

What is prompt caching in LLM development?

Accepted Answer

Prompt caching is a technique where LLM providers store frequently used prompt prefixes—such as long system instructions or background context—allowing you to reuse them without paying full price for input tokens on every request.

Question 5

What is the minimum token count required for caching?

Accepted Answer

For Anthropic Claude models, a minimum prefix size of 1,024 tokens is required to trigger caching. OpenAI caches prefixes automatically in 128-token increments.

Prompt Caching Optimizer

Key Features

Use Cases

Prompt Caching Optimizer

Key Features

Use Cases