The MCP Server That Worked Fine Until You Added a Second Pod

You scale your MCP server from one pod to two. A load balancer sits in front. You test it, everything looks fine. Then, under real traffic, a client sends initialize to pod A, gets back an Mcp-Session-Id, and fires the next request - which hashes to pod B. Pod B has never seen that session. It returns a 404. The SSE channel breaks. Thirty seconds later, the client throws a timeout.

The fix most teams reach for is sticky sessions on the load balancer. It works. But it is an infrastructure workaround for a design decision baked into the protocol itself: MCP, until now, assumed that every request in a conversation would land on the same server instance.

The 2026-07-28 release candidate - locked May 21, final spec publishing July 28 - removes that assumption entirely.

What actually changed

The release candidate removes the initialize handshake, kills the Mcp-Session-Id header, and makes every MCP request self-contained.

With both gone, any MCP request can land on any server instance, and the sticky routing and shared session stores that horizontal deployments needed before are no longer required at the protocol layer.

The operational consequence is blunt. In practice it means three things for your infrastructure: you can drop sticky session config from your load balancer, you can remove any shared Redis-based session store, and you can stop sending Mcp-Session-Id from your gateway routing rules.

Every Streamable HTTP request must now include Mcp-Method (e.g., tools/call) and Mcp-Name (the name of the tool or resource). This lets load balancers, API gateways, and rate limiters route on the operation without buffering and parsing the JSON-RPC body.

What stateless protocol doesn't mean

There is a misread going around that "stateless protocol" means you can't hold state in your server. You can. Removing the protocol-level session does not mean your application has to be stateless. Servers that need to carry state across calls can do what HTTP APIs have always done: mint an explicit handle from a tool and have the model pass it back as an ordinary argument on later calls.

The spec team's read is that this pattern is actually an improvement, not a concession. The model threading an identifier from one tool call to the next has been found to be more than just a workable substitute for session state - it's often a more powerful one. The model can compose handles across tools, reason about them, and hand them off between steps in ways that externally managed session state, hidden in transport metadata, never really could.

That is worth sitting with. Hidden session state is convenient for server authors but opaque to the model. An explicit basket_id or browser_session_id argument is something the model can inspect, chain between tools, and include in its reasoning. The old design optimized for the wrong layer.

The Tasks extension, reshaped

Tasks existed as an experimental core feature in the 2025-11-25 spec. Production use surfaced enough redesign concerns that the right home for it is an extension rather than the specification. The Tasks extension reshapes the lifecycle around the stateless model: a server can answer tools/call with a task handle, and the client drives it with tasks/get, tasks/update, and tasks/cancel.

If you shipped anything against the 2025-11-25 experimental Tasks API, you have migration work ahead. The client drives the task lifecycle with tasks/get, tasks/update, and tasks/cancel. No persistent connection required. This is how you build agents that kick off multi-minute operations without holding a socket open.

A teammate like Beagle - which runs inside Slack and Teams, handling long-running background lookups across workspaces - is exactly the kind of client this redesign benefits. A task can be kicked off in one request and polled from a completely different server instance.

The extensions framework

Beyond statelessness, the RC formalizes how the protocol grows. Extensions are identified by reverse-DNS IDs, negotiated through an extensions map on client and server capabilities, live in their own ext-* repositories with delegated maintainers, and version independently of the specification.

Two extensions ship with the spec itself: the Tasks extension and MCP Apps. MCP Apps lets servers ship sandboxed HTML interfaces that hosts render in an iframe. Tools declare their UI templates ahead of time so hosts can prefetch and security-review them before anything runs. Every UI-initiated action goes through the same JSON-RPC audit path as a direct tool call.

The governance story matters as much as the technical story. Previously, extensions had no formal process behind them. Another major shift is that extensions are becoming a formal part of MCP governance. Ecosystems become brittle when every new capability has to land in the core - either the core grows too fast and becomes hard to stabilize, or innovation slows down. MCP's answer is to make extensions official, versioned, and independently governed.

What's deprecated (and when it actually goes away)

Three features enter deprecation: Roots, Sampling, and Logging. The important word is deprecated, not removed. All three continue working in every spec version published within 12 months of July 28, 2026 - that runway takes you to mid-2027 at minimum.

Sampling is the one that will generate the most friction. Sampling lets MCP server tools piggyback on the client's LLM for completions - a lightweight way to add inference without calling a separate API. The spec maintainers want servers to make direct LLM provider API calls instead. The argument is cleaner separation of concerns. Developers who built lightweight inference flows using Sampling are now looking at refactoring work.

The honest read on what's new versus incremental

The stateless core is genuinely new and solves a real production problem that most teams have either hit or worked around with sticky sessions. That is not a patch; it changes the deployment model.

The extensions framework and deprecation policy are governance maturation. Necessary, long overdue, not exciting. Most MCP server failures in production stem from assumptions about session state that the protocol never guaranteed. The 2026-07-28 specification release candidate eliminates this entire class of problems by making the protocol stateless at the core.

MCP's TypeScript and Python SDKs hit 97 million monthly downloads in March 2026 - up from 2 million at launch. There are over 9,400 public servers in production today. More than 80% of Fortune 500 companies running AI agents in production connect those agents to tools via MCP. A protocol change at that scale is not academic. If you run a remote MCP server, this is the spec you will be building on for a long time. It gives MCP the foundation the maintainers expect it to grow on: a protocol that runs statelessly on commodity HTTP infrastructure, an extensions framework where capabilities like Tasks and MCP Apps can ship on their own timeline, and a lifecycle policy that lets implementers build on 2026-07-28 knowing what they ship will keep working.

The ten-week validation window closes on July 28. The release candidate is at modelcontextprotocol.io.