Progressive Disclosure for AI Tool Discovery
Jan 20, 2026 by Ostack Team
When an AI agent connects to a dozen MCP servers, it can inherit hundreds of tools. Loading every tool definition into the model's context on every turn consumes 100,000+ tokens before the agent even starts reasoning. Anthropic's engineering team identified this problem in their Code execution with MCP article (November 2025), proposing code execution as the solution: instead of calling tools directly, the agent writes code against generated APIs, keeping intermediate results in a sandbox.
But code execution is not the only way to solve this. Progressive disclosure — loading tool definitions in layers rather than all at once — can deliver comparable token reduction without introducing a runtime. This post explores that alternative approach, the tradeoffs involved, and where it breaks down.
Why Flat Tool Lists Break at Scale
Most agent frameworks ship every tool definition to the model in a single payload. For a stack with 100 tools, that is roughly 10,000 tokens of schema before any reasoning begins. At 1,000 tools it becomes untenable — the context window fills with JSON schemas instead of the user's actual problem.
The cost is not just tokens. Flat lists force models to scan every tool description on every turn, increasing latency and degrading selection accuracy. Anthropic's own tool use documentation notes that fewer, more relevant tool definitions produce better outcomes than exhaustive catalogs. The original article quantified the problem: 1,000 tools at roughly 100 tokens each means 100,000 tokens consumed before the agent does any work.
Hierarchical Tool Discovery: Three Levels
The core idea is simple: instead of sending everything at once, expose tools through three levels of detail. The model requests more information only when it needs it.
- Level 1 — Names only. The model receives a flat list of tool names:
["figma_read_files", "github_list_repos", ...]. Cost: roughly 20 tokens for 1,000 tools. Enough for the model to decide which tools might be relevant. - Level 2 — Summaries. For the shortlisted tools, the model requests one-line descriptions and MCP origin. Cost: roughly 100 tokens for 1,000 tools.
- Level 3 — Full schema. Only the tools the model actually intends to call get their complete JSON schema loaded: parameters, types, constraints. Loaded on demand, one at a time.
The reduction is dramatic. A 1,000-tool stack drops from 100,000 tokens to under 20 tokens for the initial discovery pass — a 99% reduction. The model pays the full token cost only for the handful of tools it actually invokes.
Dual-Mode Client Support
Not all AI clients support the same discovery protocol. Claude supports native tool metadata with defer_loading hints and cache_control directives. GPT-4, Gemini, and custom MCP clients do not. A practical implementation needs to detect the client type and serve the appropriate format:
- Anthropic native mode: leverages Claude's built-in deferred loading, compatible with the Tool Search Tool specification
- Custom hierarchical mode: the three-level REST API described above, compatible with any HTTP client
This means the token savings can be made available to every AI client, not just Claude.
Tradeoffs Worth Knowing
Progressive disclosure is not free. Three round-trips (names → summaries → full schema) add latency compared to a single bulk load. A reasonable latency budget is 50 ms for Level 1, 100 ms for Level 2, and 150 ms for Level 3 — meaning a full drill-down takes roughly 300 ms versus a single 200 ms bulk load. For most workloads this is a good trade: you pay 300 ms once for the first tool selection, then hit cache for subsequent turns.
Cache invalidation is the harder problem. When an MCP server adds or removes a tool, all three cache levels need to be invalidated. A hierarchical key structure keeps this manageable:
stack:{id}:tools:names → Level 1 (names only)
stack:{id}:tools:summary → Level 2 (summaries)
stack:{id}:tool:{name}:full → Level 3 (full schema, per tool)
A baseline TTL of around 1 hour, supplemented by event-driven invalidation on configuration changes, handles most cases — but stale tool lists remain a risk during rapid MCP server iteration.
Model behavior with sparse information. When a model receives only tool names at Level 1, can it reliably identify which tools to explore further? Well-named tools (github_create_issue) are self-explanatory, but accuracy drops for ambiguously named ones. One mitigation: audit-derived usage examples — generated from recent successful invocations, PII-scrubbed, and curated before publication — give the model concrete input/output patterns alongside the schema, improving selection accuracy.
The intermediate-result gap. Progressive disclosure solves the discovery token problem — finding the right tool to call. It does not solve the execution token problem — large tool outputs (50,000-token transcripts, database exports) flowing through the context window. The code execution approach proposed in the Anthropic article handles this by keeping intermediate results in a sandbox. These are complementary techniques: hierarchical discovery reduces the cost of finding tools, while sandboxed execution reduces the cost of processing large results. An agent framework could use both.
Where This Fits in the Landscape
The token problem has multiple solutions, and they are not mutually exclusive:
| Approach | Solves | Does not solve |
|---|---|---|
| Progressive disclosure | Discovery overhead (tool definitions) | Large intermediate results |
| Code execution / sandbox | Intermediate results, data filtering | Requires runtime infrastructure |
| Tool search / retrieval | Relevance ranking | Requires embedding infrastructure |
| Static pruning | Simple stacks with few tools | Dynamic tool availability |
Progressive disclosure is the lightest-weight option — it requires only a caching layer and a tiered API, no execution environment or vector database. For stacks where the primary bottleneck is tool definition overhead rather than large data payloads, it captures most of the value. For data-heavy pipelines, combining it with a sandboxed runtime covers both dimensions.
For most teams connecting AI agents to MCP servers today, progressive disclosure is the practical starting point — it requires the least infrastructure and addresses the most common bottleneck. Add the heavier approaches when the workload demands them.