All articles
case study · intermediate ·

Giving agents hands: a CLI that lets coding agents explore any API

When your platform has 2,000+ API endpoints across microservices and gateway layers, the surface is constantly changing. Documentation goes stale, new use cases reshape input and output schemas, and a spec-driven CLI turns API exploration into agent-ready discovery.

api-explorationclicoding-agentsdeveloper-experienceopenapi

The Problem: APIs at Scale

Large enterprise platforms accumulate APIs the way cities accumulate roads. Start with a few clean endpoints, add microservices and gateway layers over time, and before long you have 2,000+ operations spanning 100+ domains — accounts, tickets, conversations, deployments, permissions, analytics, AI services. Each with its own authentication requirements, pagination patterns, and data shapes.

That surface doesn’t stay still. Endpoints are added, renamed, split behind gateways, or routed to new backing services. Documentation goes stale the moment teams ship new use cases, and those use cases often change request and response shapes in subtle ways — one payload adds nested filters, another returns a new envelope, a third changes how pagination tokens are emitted.

For a human developer, this is manageable. You open the documentation, search for what you need, try a few requests in Postman, read the response, iterate. The process is slow but navigable.

For a coding agent, this API surface is a wall.

Human path

Traditional API exploration:

  • Open API docs in a browser
  • Scroll, search, click through pages
  • Copy endpoint URLs into Postman
  • Manually construct request bodies
  • Read response, adjust, retry
  • Build mental model over hours or days

Slow, but works. The human brain fills in gaps, infers patterns, and adapts.

Agent problem

What a coding agent faces:

  • Can’t open a browser or click through docs
  • Can’t scroll or visually scan
  • Needs exact endpoint paths, methods, and schemas
  • Must construct requests programmatically
  • Needs structured, parseable responses
  • Has no way to discover what’s available

The agent is powerful but blind. It can execute, but it can’t explore.


Why Agents Struggle with APIs

This isn’t just about documentation format. There are structural reasons why traditional API tooling fails coding agents:

Why Traditional API Tools Fail Agents

Why traditional API tools fail agents
ChallengeWhy It Breaks Agents
Interactive interfacesPostman, Swagger UI, and API consoles require mouse clicks, form fills, and visual navigation. Agents work through text.
No incremental discoveryMost API docs are a flat list. There’s no hierarchical path from “what domains exist” to “what can I do in this domain” to “what does this endpoint accept.”
Unstructured outputsPretty-printed HTML responses, mixed content types, and inconsistent error formats make reliable parsing fragile.
Permission opacityAgents can’t easily tell which endpoints they’re allowed to call. Trial-and-error against auth barriers wastes tokens and time.
Pagination complexityCursor-based pagination requires manual state management — follow the cursor, accumulate results, handle edge cases. Most agents give up after the first page.
RPC-style patternsMany enterprise APIs use POST for everything — reads, writes, searches. The REST mental model (GET = read, POST = write) breaks down.

The Core Insight: CLIs as Agent Interfaces

The answer was hiding in plain sight. CLIs are the original text-in, text-out interface. They’re composable, scriptable, and non-interactive by default. Every coding agent already knows how to run shell commands.

But most CLIs are designed for human ergonomics — colorful output, interactive prompts, confirmation dialogs. The insight wasn’t “build a CLI.” It was: build a CLI where the OpenAPI specification IS the interface.

This means:

  • Zero hard-coded routes. Every endpoint comes from the spec. Add an endpoint to the spec, and it’s immediately available in the CLI. No code changes.
  • The CLI is always complete. If the API has 1,000+ endpoints, the CLI has 1,000+ commands. No manual mapping, no forgotten endpoints, no drift between docs and reality.
  • Discovery is built in. The same spec that defines endpoints also provides their schemas, parameters, and descriptions — all queryable from the terminal.
my-cli api — discovery workflow1 · DISCOVER$ my-cli api domainsaccounts12endpointsworks8endpointsanalytics14endpoints2 · FILTER$ my-cli api list —domain worksGET /works.list List work itemsPOST /works.create Create a work itemGET /works.get Get work item by ID3 · INSPECT$ my-cli api /works.create —dry-runMethod POSTURL https://api.example.com/works.createBody {}4 · EXECUTE$ my-cli api /works.create -F title=“Fix login bug” -F type=issue —jsonHTTP 201 · id: ISS-123 · title: Fix login bug

Design Decisions That Enable Agency

Every design choice was evaluated through a dual lens: does this work for a human developer at a terminal, AND does this work for a coding agent running commands non-interactively? Seven decisions stood out:

Design Decisions and Their Agent Impact

Design decisions and their agent impact
DecisionHuman BenefitAgent Benefit
Spec-driven commandsAlways up-to-date with API changesSelf-discovery — agent can list all available operations
Structured output (--json)Clean data for scriptingReliable parsing — {status, body, headers} envelope
TTY detectionInteractive picker when in terminalAuto non-interactive mode when agent runs commands
Dry-run & curl generationPreview before executingSafe exploration — understand requests without side effects
Auto-paginationNo manual cursor managementComplete result sets in one command
Typed field coercionLess shell escaping hasslePredictable input handling — booleans, numbers, JSON, file refs
Permission-as-configProject-level safety policiesBounded exploration — agent sees only allowed endpoints

TTY Detection: The Dual-Mode Pattern

The most subtle design decision. When run in a terminal (TTY), the CLI offers an interactive fuzzy-search picker — type a few characters, see matching endpoints, select one. When run non-interactively (piped, backgrounded, or by an agent), it silently switches to structured output mode. No prompts, no confirmations, no color codes that break parsing.

The agent never needs to know about this. It just works.

Structured Output Envelope

In non-interactive mode, every response follows the same shape:

{"status": 200, "body": {"works": [...], "next_cursor": "abc"}, "headers": {...}}

Status codes, response bodies, and headers — all in a predictable envelope. The agent can write one parser that works for every endpoint. Contrast this with raw curl, where you’d need to parse status from headers, handle different content types, and deal with error formats that vary by endpoint.

Permission Boundaries

A regex-based allow/deny system controls which endpoints are visible and executable. Configuration lives in settings files — project-level or global — not in code. This means:

  • Safe exploration: Give an agent access to read-only endpoints while blocking mutations
  • Progressive trust: Start with a narrow allowlist, widen as confidence grows
  • No code changes: Permissions are config, not logic
Loading diagram…

The Observability Surface for APIs

Before this tool, understanding the API landscape meant reading documentation pages one by one. After, the entire surface is queryable from the terminal.

Discover the landscape

Stage 1

api domains reveals API domains with endpoint counts, showing which areas are large, specialized, or worth exploring first.

Narrow and filter

Stage 2

api list --domain works collapses the search space from thousands of endpoints to a focused domain-level list.

Understand and execute

Stage 3

--dry-run, --generate=curl, and --fields make every request inspectable before and after execution.

1
Discover
Run api domains to map the surface before guessing endpoints
2
Filter
Use api list --domain works to shrink the search space
3
Preview
Dry-run the command or generate curl before making the request
4
Execute
Invoke with structured output and optional field selection
Loading diagram…

Before vs. After: API Exploration Workflow

Before vs. after API exploration workflow
StepBefore (Manual)After (CLI)
Find available endpointsBrowse docs, Ctrl+F, scan pagesapi list --domain accounts --format json
Understand request shapeRead API reference, find examplesapi /accounts.list --dry-run
Test an endpointCopy to Postman, fill fields, sendapi /accounts.list -F limit=5
Get paginated resultsWrite cursor loop in codeapi /accounts.list --paginate
Extract specific fieldsParse full response in codeapi /accounts.list --fields id,display_name
Generate integration codeManually write fetch/axios callsapi /accounts.list --generate=curl

Patterns for Agent-Friendly CLI Design

The techniques that emerged from this project are generalizable. Any CLI that wants to serve both humans and coding agents can apply these patterns:

Agent-Friendly CLI Design Patterns

Agent-friendly CLI design patterns
PatternImplementationWhy It Matters
TTY-aware dual modeDetect stdin.isTTY; interactive prompts for humans, structured output for agentsOne binary serves both audiences without flags
Structured envelopeWrap responses in {status, body, headers}Agents write one parser, not per-endpoint parsing
Spec-driven commandsGenerate CLI surface from OpenAPI at runtimeCLI can’t drift from API; always complete and accurate
Incremental discoverydomains → endpoints → schema → invokeAgents navigate API surface without documentation
Permission-as-configRegex allow/deny in JSON settings filesSafety boundaries without code changes; progressive trust
Typed field coercionAuto-parse booleans, numbers, JSON, file refs (@file.json)Reduces agent friction with shell escaping and type mismatches
Preview-before-execute--dry-run shows request; --generate=curl produces equivalentAgents can plan and verify before committing to actions
Auto-pagination--paginate follows cursors, combines pagesAgents get complete data sets without state management
Loading diagram…

What Changes When Agents Can Explore

The most significant impact isn’t speed — it’s autonomy. When a coding agent can discover, understand, and invoke APIs without human guidance, the nature of the collaboration changes.

From “tell me the endpoint” to self-discovery

Without the CLI, a typical agent interaction looks like: “I need to list tickets. What’s the endpoint?” With it, the agent runs api domains, finds works, runs api list --domain works, finds /works.list, runs --dry-run to understand the schema, and executes. No human in the loop.

Integration workflows collapse

Building a feature that touches an unfamiliar API domain used to mean hours of documentation reading and Postman experimentation. Now the discovery-to-execution cycle is four commands. The agent handles the exploration; the developer focuses on the business logic.

The CLI is the documentation

New developers (and agents) don’t need to find and read API docs. The CLI itself is queryable documentation. api list is the endpoint catalog. --dry-run is the request reference. --fields is the response guide. The API is self-describing through its tooling.

Safe exploration in production

The permission model means agents can explore real environments without risk. Set up a read-only allowlist, point the agent at a staging environment, and let it learn the API surface. When ready, widen permissions for write operations. The progression from read-only exploration to full access is controlled, auditable, and reversible.


Using the CLI as an agent skill

The CLI compounds in value when wired into how Claude Code operates. A companion skill defines exactly when and how to use it, so Claude follows a documented workflow rather than inferring API mechanics from context.

CLAUDE.md as the entry point

CLAUDE.md is read at the start of every Claude Code session. A reference to the companion skill is enough to activate it:

## API access
When the task requires reading from or writing to any API endpoint, use the @cli-for-agents skill.

Claude treats this as a standing instruction. Any task involving API data triggers skill lookup before attempting direct execution.

What the skill defines

The skill is designed around the CLI’s own design principles: non-interactive first, incremental discovery, and fail usefully with correct examples. It teaches Claude three things:

When to use the CLI — any task that requires listing, filtering, reading, or writing API resources: “fetch open issues,” “create a work item,” “list accounts by domain.”

How to discover the right endpoint — the four-step pattern the CLI is built around, matching the terminal diagram above:

my-cli api domains                         # map the API surface
my-cli api list --domain <name>            # scope to a relevant domain
my-cli api <endpoint> --dry-run            # inspect before executing
my-cli api <endpoint> -F key=value --json  # execute with structured output

How to handle the response — the skill specifies the --json envelope shape (status, body, headers), instructs Claude to check status before processing body, and sets safe defaults: --dry-run for any write operation before executing, --fields to limit response size on large result sets, --paginate when completeness matters.

How Claude completes the task

With the skill active, “create a ticket for the login bug” becomes a four-step uninterrupted sequence. No human guidance needed for API mechanics:

agent session bash
my-cli api domains
# → finds 'works' domain

my-cli api list --domain works
# → finds /works.create — POST

my-cli api /works.create --dry-run
# → confirms required fields: title, type

my-cli api /works.create -F title="Fix login bug" -F type=issue --json
# → status: 201 — id: ISS-123

Claude discovers the surface, verifies the shape, and executes. The skill provides the method; the CLAUDE.md reference makes that method available at session start.


Takeaways

Each iteration of building this tool reinforced a set of principles about designing for the agent era:

Spec-driven design eliminates drift

Generating every command from the OpenAPI spec made the CLI complete, accurate, and trustworthy for autonomous agents.

TTY detection is the simplest dual-mode pattern

A single TTY check lets humans get interactive prompts while agents get structured JSON and silent operation.

Agents need incremental discovery

Domains, filtered endpoint lists, and request previews create a navigable path where a flat endpoint catalog would fail.

Permission-as-config enables progressive trust

Settings-level permissions allow cautious read-only exploration first, then wider access as confidence grows.

Preview modes prevent expensive mistakes

Dry-run and curl generation let agents understand mutation requests before executing them.

The best agent tools are also great human tools

Structured output, auto-pagination, and incremental discovery improved the developer experience as much as agent autonomy.

Loading diagram…