Giving agents hands: a CLI that lets coding agents explore any API
When your platform has 2,000+ API endpoints across microservices and gateway layers, the surface is constantly changing. Documentation goes stale, new use cases reshape input and output schemas, and a spec-driven CLI turns API exploration into agent-ready discovery.
The Problem: APIs at Scale
Large enterprise platforms accumulate APIs the way cities accumulate roads. Start with a few clean endpoints, add microservices and gateway layers over time, and before long you have 2,000+ operations spanning 100+ domains — accounts, tickets, conversations, deployments, permissions, analytics, AI services. Each with its own authentication requirements, pagination patterns, and data shapes.
That surface doesn’t stay still. Endpoints are added, renamed, split behind gateways, or routed to new backing services. Documentation goes stale the moment teams ship new use cases, and those use cases often change request and response shapes in subtle ways — one payload adds nested filters, another returns a new envelope, a third changes how pagination tokens are emitted.
For a human developer, this is manageable. You open the documentation, search for what you need, try a few requests in Postman, read the response, iterate. The process is slow but navigable.
For a coding agent, this API surface is a wall.
Traditional API exploration:
- Open API docs in a browser
- Scroll, search, click through pages
- Copy endpoint URLs into Postman
- Manually construct request bodies
- Read response, adjust, retry
- Build mental model over hours or days
Slow, but works. The human brain fills in gaps, infers patterns, and adapts.
What a coding agent faces:
- Can’t open a browser or click through docs
- Can’t scroll or visually scan
- Needs exact endpoint paths, methods, and schemas
- Must construct requests programmatically
- Needs structured, parseable responses
- Has no way to discover what’s available
The agent is powerful but blind. It can execute, but it can’t explore.
Why Agents Struggle with APIs
This isn’t just about documentation format. There are structural reasons why traditional API tooling fails coding agents:
Why Traditional API Tools Fail Agents
| Challenge | Why It Breaks Agents |
|---|---|
| Interactive interfaces | Postman, Swagger UI, and API consoles require mouse clicks, form fills, and visual navigation. Agents work through text. |
| No incremental discovery | Most API docs are a flat list. There’s no hierarchical path from “what domains exist” to “what can I do in this domain” to “what does this endpoint accept.” |
| Unstructured outputs | Pretty-printed HTML responses, mixed content types, and inconsistent error formats make reliable parsing fragile. |
| Permission opacity | Agents can’t easily tell which endpoints they’re allowed to call. Trial-and-error against auth barriers wastes tokens and time. |
| Pagination complexity | Cursor-based pagination requires manual state management — follow the cursor, accumulate results, handle edge cases. Most agents give up after the first page. |
| RPC-style patterns | Many enterprise APIs use POST for everything — reads, writes, searches. The REST mental model (GET = read, POST = write) breaks down. |
The Core Insight: CLIs as Agent Interfaces
The answer was hiding in plain sight. CLIs are the original text-in, text-out interface. They’re composable, scriptable, and non-interactive by default. Every coding agent already knows how to run shell commands.
But most CLIs are designed for human ergonomics — colorful output, interactive prompts, confirmation dialogs. The insight wasn’t “build a CLI.” It was: build a CLI where the OpenAPI specification IS the interface.
This means:
- Zero hard-coded routes. Every endpoint comes from the spec. Add an endpoint to the spec, and it’s immediately available in the CLI. No code changes.
- The CLI is always complete. If the API has 1,000+ endpoints, the CLI has 1,000+ commands. No manual mapping, no forgotten endpoints, no drift between docs and reality.
- Discovery is built in. The same spec that defines endpoints also provides their schemas, parameters, and descriptions — all queryable from the terminal.
Design Decisions That Enable Agency
Every design choice was evaluated through a dual lens: does this work for a human developer at a terminal, AND does this work for a coding agent running commands non-interactively? Seven decisions stood out:
Design Decisions and Their Agent Impact
| Decision | Human Benefit | Agent Benefit |
|---|---|---|
| Spec-driven commands | Always up-to-date with API changes | Self-discovery — agent can list all available operations |
Structured output (--json) | Clean data for scripting | Reliable parsing — {status, body, headers} envelope |
| TTY detection | Interactive picker when in terminal | Auto non-interactive mode when agent runs commands |
| Dry-run & curl generation | Preview before executing | Safe exploration — understand requests without side effects |
| Auto-pagination | No manual cursor management | Complete result sets in one command |
| Typed field coercion | Less shell escaping hassle | Predictable input handling — booleans, numbers, JSON, file refs |
| Permission-as-config | Project-level safety policies | Bounded exploration — agent sees only allowed endpoints |
TTY Detection: The Dual-Mode Pattern
The most subtle design decision. When run in a terminal (TTY), the CLI offers an interactive fuzzy-search picker — type a few characters, see matching endpoints, select one. When run non-interactively (piped, backgrounded, or by an agent), it silently switches to structured output mode. No prompts, no confirmations, no color codes that break parsing.
The agent never needs to know about this. It just works.
Structured Output Envelope
In non-interactive mode, every response follows the same shape:
{"status": 200, "body": {"works": [...], "next_cursor": "abc"}, "headers": {...}}
Status codes, response bodies, and headers — all in a predictable envelope. The agent can write one parser that works for every endpoint. Contrast this with raw curl, where you’d need to parse status from headers, handle different content types, and deal with error formats that vary by endpoint.
Permission Boundaries
A regex-based allow/deny system controls which endpoints are visible and executable. Configuration lives in settings files — project-level or global — not in code. This means:
- Safe exploration: Give an agent access to read-only endpoints while blocking mutations
- Progressive trust: Start with a narrow allowlist, widen as confidence grows
- No code changes: Permissions are config, not logic
The Observability Surface for APIs
Before this tool, understanding the API landscape meant reading documentation pages one by one. After, the entire surface is queryable from the terminal.
Discover the landscape
api domains reveals API domains with endpoint counts, showing which areas are large, specialized, or worth exploring first.
Narrow and filter
api list --domain works collapses the search space from thousands of endpoints to a focused domain-level list.
Understand and execute
--dry-run, --generate=curl, and --fields make every request inspectable before and after execution.
Before vs. After: API Exploration Workflow
| Step | Before (Manual) | After (CLI) |
|---|---|---|
| Find available endpoints | Browse docs, Ctrl+F, scan pages | api list --domain accounts --format json |
| Understand request shape | Read API reference, find examples | api /accounts.list --dry-run |
| Test an endpoint | Copy to Postman, fill fields, send | api /accounts.list -F limit=5 |
| Get paginated results | Write cursor loop in code | api /accounts.list --paginate |
| Extract specific fields | Parse full response in code | api /accounts.list --fields id,display_name |
| Generate integration code | Manually write fetch/axios calls | api /accounts.list --generate=curl |
Patterns for Agent-Friendly CLI Design
The techniques that emerged from this project are generalizable. Any CLI that wants to serve both humans and coding agents can apply these patterns:
Agent-Friendly CLI Design Patterns
| Pattern | Implementation | Why It Matters |
|---|---|---|
| TTY-aware dual mode | Detect stdin.isTTY; interactive prompts for humans, structured output for agents | One binary serves both audiences without flags |
| Structured envelope | Wrap responses in {status, body, headers} | Agents write one parser, not per-endpoint parsing |
| Spec-driven commands | Generate CLI surface from OpenAPI at runtime | CLI can’t drift from API; always complete and accurate |
| Incremental discovery | domains → endpoints → schema → invoke | Agents navigate API surface without documentation |
| Permission-as-config | Regex allow/deny in JSON settings files | Safety boundaries without code changes; progressive trust |
| Typed field coercion | Auto-parse booleans, numbers, JSON, file refs (@file.json) | Reduces agent friction with shell escaping and type mismatches |
| Preview-before-execute | --dry-run shows request; --generate=curl produces equivalent | Agents can plan and verify before committing to actions |
| Auto-pagination | --paginate follows cursors, combines pages | Agents get complete data sets without state management |
What Changes When Agents Can Explore
The most significant impact isn’t speed — it’s autonomy. When a coding agent can discover, understand, and invoke APIs without human guidance, the nature of the collaboration changes.
From “tell me the endpoint” to self-discovery
Without the CLI, a typical agent interaction looks like: “I need to list tickets. What’s the endpoint?” With it, the agent runs api domains, finds works, runs api list --domain works, finds /works.list, runs --dry-run to understand the schema, and executes. No human in the loop.
Integration workflows collapse
Building a feature that touches an unfamiliar API domain used to mean hours of documentation reading and Postman experimentation. Now the discovery-to-execution cycle is four commands. The agent handles the exploration; the developer focuses on the business logic.
The CLI is the documentation
New developers (and agents) don’t need to find and read API docs. The CLI itself is queryable documentation. api list is the endpoint catalog. --dry-run is the request reference. --fields is the response guide. The API is self-describing through its tooling.
Safe exploration in production
The permission model means agents can explore real environments without risk. Set up a read-only allowlist, point the agent at a staging environment, and let it learn the API surface. When ready, widen permissions for write operations. The progression from read-only exploration to full access is controlled, auditable, and reversible.
Using the CLI as an agent skill
The CLI compounds in value when wired into how Claude Code operates. A companion skill defines exactly when and how to use it, so Claude follows a documented workflow rather than inferring API mechanics from context.
CLAUDE.md as the entry point
CLAUDE.md is read at the start of every Claude Code session. A reference to the companion skill is enough to activate it:
## API access
When the task requires reading from or writing to any API endpoint, use the @cli-for-agents skill.
Claude treats this as a standing instruction. Any task involving API data triggers skill lookup before attempting direct execution.
What the skill defines
The skill is designed around the CLI’s own design principles: non-interactive first, incremental discovery, and fail usefully with correct examples. It teaches Claude three things:
When to use the CLI — any task that requires listing, filtering, reading, or writing API resources: “fetch open issues,” “create a work item,” “list accounts by domain.”
How to discover the right endpoint — the four-step pattern the CLI is built around, matching the terminal diagram above:
my-cli api domains # map the API surface
my-cli api list --domain <name> # scope to a relevant domain
my-cli api <endpoint> --dry-run # inspect before executing
my-cli api <endpoint> -F key=value --json # execute with structured output
How to handle the response — the skill specifies the --json envelope shape (status, body, headers), instructs Claude to check status before processing body, and sets safe defaults: --dry-run for any write operation before executing, --fields to limit response size on large result sets, --paginate when completeness matters.
How Claude completes the task
With the skill active, “create a ticket for the login bug” becomes a four-step uninterrupted sequence. No human guidance needed for API mechanics:
my-cli api domains
# → finds 'works' domain
my-cli api list --domain works
# → finds /works.create — POST
my-cli api /works.create --dry-run
# → confirms required fields: title, type
my-cli api /works.create -F title="Fix login bug" -F type=issue --json
# → status: 201 — id: ISS-123 Claude discovers the surface, verifies the shape, and executes. The skill provides the method; the CLAUDE.md reference makes that method available at session start.
Takeaways
Each iteration of building this tool reinforced a set of principles about designing for the agent era:
Spec-driven design eliminates drift
Generating every command from the OpenAPI spec made the CLI complete, accurate, and trustworthy for autonomous agents.
TTY detection is the simplest dual-mode pattern
A single TTY check lets humans get interactive prompts while agents get structured JSON and silent operation.
Agents need incremental discovery
Domains, filtered endpoint lists, and request previews create a navigable path where a flat endpoint catalog would fail.
Permission-as-config enables progressive trust
Settings-level permissions allow cautious read-only exploration first, then wider access as confidence grows.
Preview modes prevent expensive mistakes
Dry-run and curl generation let agents understand mutation requests before executing them.
The best agent tools are also great human tools
Structured output, auto-pagination, and incremental discovery improved the developer experience as much as agent autonomy.