All articles
platform engineering · intermediate ·

Claude platform features: prompt caching, extended thinking, and citations

A production reference for three Claude features that meaningfully change how you build — manual cache breakpoints, auditable reasoning, and per-claim source tracking.

apicitationsclaudeextended-thinkingproductionprompt-caching

Three Claude platform features meaningfully change how you build production applications. This reference covers the details that matter when you’re past the prototyping stage.

Prompt caching

Prompt caching stores the computational work from previous requests so follow-up calls can reuse it — faster and cheaper. The cache lives for one hour.

It is not automatic

Caching requires manual cache breakpoints using the longhand text block form. The shorthand form does not support it.

cache-breakpoint.py python
# Longhand form — required for caching
{
    "type": "text",
    "text": "Your long system prompt or document here...",
    "cache_control": {"type": "ephemeral"}
}

Content before the breakpoint gets cached; content after does not. You can place up to four breakpoints total.

What to cache

Best candidates for caching
Content typeWhy it’s a good candidate
System promptsRarely change between requests
Tool definitionsStable across conversations
Long documentsLarge token savings when reused
Conversation history up to a pointSubsequent messages append after the breakpoint

Behind the scenes, Claude processes components in a fixed order: tools first, then system prompt, then messages. Understanding this order helps you place breakpoints where they’ll maximize reuse.

Cache brittleness

The cache is extremely fragile — changing even a single character invalidates it. Adding “please” to a prompt forces reprocessing of everything before the breakpoint. Structure your prompts so the cached portion remains identical across requests.

Minimum threshold

Content must total at least 1,024 tokens across all cached blocks combined. Individual blocks do not need to meet this threshold — the sum counts.

Extended thinking

Extended thinking gives Claude transparent reasoning space before answering. For Claude 4.6+, use the thinking parameter with an effort level:

extended-thinking.py python
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 2000
    },
    messages=[{"role": "user", "content": "Explain this complex proof"}]
)

The thinking output is visible in the response, creating an auditable reasoning trail. For safety-sensitive applications, thinking can be redacted — the model still reasons internally, but sensitive intermediate steps are not exposed.

Cryptographic signatures

Extended thinking responses include cryptographic signatures that prevent tampering. If you modify the reasoning output, the signature no longer validates. This matters for compliance and audit scenarios where you need to prove that the model’s reasoning has not been altered after the fact.

Citations

Citations give you per-claim source tracking. When Claude references a source document, the response includes precise offsets:

citation-response.json json
{
    "type": "text",
    "text": "The defendant argued that the contract was void.",
    "citations": [
        {
            "type": "char_location",
            "cited_text": "the contract was void",
            "document_index": 0,
            "document_title": "legal_brief.pdf",
            "start_char_index": 142,
            "end_char_index": 163
        }
    ]
}

This is fundamentally different from asking Claude to “cite your sources” in plain text. Citations are structured data attached to specific claims — you can build UIs that highlight exactly which sentences come from which parts of which documents. For legal, medical, and financial applications, this transforms Claude from a useful tool into an auditable system.

Code execution

Claude can execute Python code in a Docker-isolated environment as a server-side tool. Combined with the Files API, the workflow is:

  1. Pre-upload input files via the Files API
  2. Claude writes and executes Python to process them
  3. Download generated outputs via the Files API
code-execution-files.py python
# Upload a file
file_response = client.files.create(
    file=open("data.csv", "rb"),
    purpose="user_data"
)

# Claude can now reference and process it
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=[{"type": "code_execution_20250819"}],
    messages=[{
        "role": "user",
        "content": f"Analyze the CSV at file_id={file_response.id}"
    }]
)

Putting them together

These features compound in production.

Claude platform features in combination
FeatureProvidesBest paired with
Prompt cachingLower cost and latency for repeated contextCitations (cached documents get cited)
Extended thinkingAuditable, signed reasoningCitations (reasoning can link to sources)
CitationsPer-claim source traceabilityCode execution (execute analysis, cite the outputs)
Code executionDeterministic computation in promptsFiles API (upload inputs, download outputs)

Takeaways

Prompt caching is explicit

Caching requires `cache_control` breakpoints on longhand text blocks, with up to four breakpoints per request.

Cache keys are brittle

A one-character change invalidates cached content, so stable context must be isolated from variable request text.

Extended thinking supports auditability

Visible reasoning and cryptographic signatures make it possible to inspect and verify reasoning traces in sensitive workflows.

Citations are structured source links

Citation metadata attaches offsets to claims, which supports source-highlighting interfaces instead of plain-text citation guesses.

Code execution adds deterministic computation

The Files API plus server-side Python execution lets Claude process inputs and return generated artifacts inside a controlled environment.