Claude platform features: prompt caching, extended thinking, and citations
A production reference for three Claude features that meaningfully change how you build — manual cache breakpoints, auditable reasoning, and per-claim source tracking.
Three Claude platform features meaningfully change how you build production applications. This reference covers the details that matter when you’re past the prototyping stage.
Prompt caching
Prompt caching stores the computational work from previous requests so follow-up calls can reuse it — faster and cheaper. The cache lives for one hour.
It is not automatic
Caching requires manual cache breakpoints using the longhand text block form. The shorthand form does not support it.
# Longhand form — required for caching
{
"type": "text",
"text": "Your long system prompt or document here...",
"cache_control": {"type": "ephemeral"}
} Content before the breakpoint gets cached; content after does not. You can place up to four breakpoints total.
What to cache
| Content type | Why it’s a good candidate |
|---|---|
| System prompts | Rarely change between requests |
| Tool definitions | Stable across conversations |
| Long documents | Large token savings when reused |
| Conversation history up to a point | Subsequent messages append after the breakpoint |
Behind the scenes, Claude processes components in a fixed order: tools first, then system prompt, then messages. Understanding this order helps you place breakpoints where they’ll maximize reuse.
Cache brittleness
The cache is extremely fragile — changing even a single character invalidates it. Adding “please” to a prompt forces reprocessing of everything before the breakpoint. Structure your prompts so the cached portion remains identical across requests.
Minimum threshold
Content must total at least 1,024 tokens across all cached blocks combined. Individual blocks do not need to meet this threshold — the sum counts.
Extended thinking
Extended thinking gives Claude transparent reasoning space before answering. For Claude 4.6+, use the thinking parameter with an effort level:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
thinking={
"type": "enabled",
"budget_tokens": 2000
},
messages=[{"role": "user", "content": "Explain this complex proof"}]
) The thinking output is visible in the response, creating an auditable reasoning trail. For safety-sensitive applications, thinking can be redacted — the model still reasons internally, but sensitive intermediate steps are not exposed.
Cryptographic signatures
Extended thinking responses include cryptographic signatures that prevent tampering. If you modify the reasoning output, the signature no longer validates. This matters for compliance and audit scenarios where you need to prove that the model’s reasoning has not been altered after the fact.
Citations
Citations give you per-claim source tracking. When Claude references a source document, the response includes precise offsets:
{
"type": "text",
"text": "The defendant argued that the contract was void.",
"citations": [
{
"type": "char_location",
"cited_text": "the contract was void",
"document_index": 0,
"document_title": "legal_brief.pdf",
"start_char_index": 142,
"end_char_index": 163
}
]
} This is fundamentally different from asking Claude to “cite your sources” in plain text. Citations are structured data attached to specific claims — you can build UIs that highlight exactly which sentences come from which parts of which documents. For legal, medical, and financial applications, this transforms Claude from a useful tool into an auditable system.
Code execution
Claude can execute Python code in a Docker-isolated environment as a server-side tool. Combined with the Files API, the workflow is:
- Pre-upload input files via the Files API
- Claude writes and executes Python to process them
- Download generated outputs via the Files API
# Upload a file
file_response = client.files.create(
file=open("data.csv", "rb"),
purpose="user_data"
)
# Claude can now reference and process it
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=[{"type": "code_execution_20250819"}],
messages=[{
"role": "user",
"content": f"Analyze the CSV at file_id={file_response.id}"
}]
) Putting them together
These features compound in production.
| Feature | Provides | Best paired with |
|---|---|---|
| Prompt caching | Lower cost and latency for repeated context | Citations (cached documents get cited) |
| Extended thinking | Auditable, signed reasoning | Citations (reasoning can link to sources) |
| Citations | Per-claim source traceability | Code execution (execute analysis, cite the outputs) |
| Code execution | Deterministic computation in prompts | Files API (upload inputs, download outputs) |
Takeaways
Prompt caching is explicit
Caching requires `cache_control` breakpoints on longhand text blocks, with up to four breakpoints per request.
Cache keys are brittle
A one-character change invalidates cached content, so stable context must be isolated from variable request text.
Extended thinking supports auditability
Visible reasoning and cryptographic signatures make it possible to inspect and verify reasoning traces in sensitive workflows.
Citations are structured source links
Citation metadata attaches offsets to claims, which supports source-highlighting interfaces instead of plain-text citation guesses.
Code execution adds deterministic computation
The Files API plus server-side Python execution lets Claude process inputs and return generated artifacts inside a controlled environment.