Area 2

Tokens, pricing
& cost management

Always reaching for the biggest model can cost several times more than necessary. A few structural habits change everything.

Input vs output tokens

You pay separately — and output costs more.

Tokens sent in (prompt) and tokens generated out (completion) are billed apart, and output is typically priced several times higher than input. For many apps, output length is the bigger cost lever.

Model (per million tokens)InputOutput
Claude Haiku 4.5 fast / cheap~$1~$5
Claude Sonnet 4.6 balanced~$3~$15
Claude Opus tier most capable~$5~$25

Illustrative rates from the source material; verify against current provider pricing.

Prompt caching

Reuse the stable parts for up to 90% off.

Reusing a stable prefix — system prompt, tool definitions, large documents — lets the provider cache it. On Anthropic, cache reads cost about 10% of the base input rate. Structure prompts with static content first, variable content last.

Input token cost with caching

Relative to base input rate (Anthropic)
Base input
100%
Cache write
1.25–2×
Cache read
~10%
📌

Static first

Put your system prompt, tool definitions and big documents at the start of every request.

🔀

Variable last

Keep the per-request bits at the end so the cached prefix stays identical and hits the cache.

Cost optimization strategies

Stack the savings.

Four levers compound. Caching + batch alone can cut costs 90%+ — e.g. Sonnet 4.6 effectively dropping from $3/$15 toward $1.50/$7.50 with batch, far lower with caching.

🎚️

Model routing

Use the cheapest model that clears your quality bar for each task.

📦

Batch APIs

~50% discount for async, non-urgent jobs.

💾

Prompt caching

Reuse stable prefixes for up to ~90% off input.

✂️

Cap output

Limit output and thinking tokens — the biggest lever for many apps.

Caching + batch can cut costs 90%+.
The difference between a thoughtful pipeline and a naive one is often a multiple, not a margin.