Does Your MCP Server Still Need Sticky Sessions?

The MCP 2026-07-28 release candidate, locked May 21, removes session IDs and the initialize handshake entirely. If you are running MCP servers behind a load balancer, here is what that means before July 28.

Two pods. Round-robin load balancer. Pod A handles initialize and mints an Mcp-Session-Id. The SDK's long-lived GET for the SSE stream hashes to pod B. Pod B returns 404. Thirty seconds later the client throws a timeout.

If that failure mode looks familiar, it is because the current MCP specification - 2025-11-25 - was never designed with horizontal scaling as a first-class concern. Session state lived at the protocol layer, which meant your infrastructure had to compensate. The standard workaround was sticky sessions pinned to Mcp-Session-Id, which works but is an infrastructure patch for a protocol-level problem.

The MCP maintainers just fixed it at the source.

What changed

The release candidate for MCP 2026-07-28 is now available. It is the largest revision of the protocol since launch and delivers a stateless core that scales on ordinary HTTP infrastructure, an extensions framework, authorization that aligns with OAuth and OpenID Connect deployments, and a formal deprecation policy.

The headline change is that MCP is now stateless at the protocol layer. Six Specification Enhancement Proposals worked together to get there, completing the plan the maintainers laid out in December 2025.

Concretely: Mcp-Session-Id and the protocol-level session that came with it are removed. With both gone, any MCP request can land on any server instance, and the sticky routing and shared session stores that horizontal deployments needed before are no longer required at the protocol layer.

The request flow becomes straightforward. A stateless MCP server can sit behind a plain round-robin load balancer with no sticky sessions and no shared session store. Gateways can route traffic on the new Mcp-Method header instead of inspecting payloads, and clients can cache tools/list responses for as long as the server's ttlMs field permits. That last part has real latency implications: in production, a tools/list call on a busy server can dominate the latency budget.

Stateless protocol, not stateless application

The word "stateless" tends to cause alarm in teams whose tools carry meaningful context - an open file handle, a browser session, a deployment environment. The spec is careful here.

Removing the protocol-level session does not mean your application has to be stateless. Servers that need to carry state across calls can do what HTTP APIs have always done: mint an explicit handle (a basket_id, a browser_id) from a tool and have the model pass it back as an ordinary argument on later calls. In practice, this pattern - the model threading an identifier from one tool call to the next - turns out to be more than a workable substitute for session state. It is often a more powerful one.

The model can reason about those handles. It can compose them, hand them between steps, and surface them in its trace. Hidden session state in transport metadata could never do that. A teammate like Beagle, which threads context across tool calls in Slack and Teams, already depends on the model carrying explicit identifiers rather than relying on transport magic.

The better architectural principle: if your server remembers something, give it a name the model can see.

The other pieces in this release

The stateless core is the structural change, but the release candidate ships more alongside it.

Tasks extension. A tools/call can now return a task handle instead of a result. The client drives the task lifecycle with tasks/get, tasks/update, and tasks/cancel. No persistent connection required. This is how you build agents that kick off multi-minute operations without holding a socket open.

MCP Apps. Servers can ship sandboxed HTML interfaces that hosts render in an iframe. Tools declare their UI templates ahead of time so hosts can security-review them before anything runs. Every UI-initiated action goes through the same JSON-RPC audit path as a direct tool call - server-authored UIs embedded in AI clients, no separate frontend deployment required.

Authorization hardening. Token scopes allow fine-grained permission control at the tool and resource level. A scope like tool:fetch_data grants permission to call the fetch_data tool. A scope like resource:db://users/*:read grants read access to user resources. Servers validate these scopes on every request and reject operations that exceed granted permissions.

Deprecation policy. Three features - Roots, Sampling, and Logging - enter deprecation but are not removed. Per the new lifecycle policy, Active → Deprecated → Removed transitions take a minimum of 12 months. Methods, types, and capability flags keep functioning in this release and in every spec version published inside the 12-month window.

What the timeline actually means for your team

The release candidate was locked on May 21, 2026, opening a 10-week window for SDK validation, conformance tests, and feedback before the spec ratifies. Spec text can still change during this window if blocking issues surface.

The Python SDK timeline is explicit: v2.0.0a1 was published on June 11, 2026, beta is targeted for June 30, 2026, and stable v2 for July 27, 2026.

The PyPI release metadata says pre-releases are published as 2.0.0aN, each pre-release may contain breaking changes, and v1.x remains the stable line recommended for production. That matters for dependency policy - a team that leaves mcp unpinned may not break immediately, but they are running without a floor.

The practical action today is narrow: if your agentic system depends on MCP servers, treat this as more than a spec-reading exercise. Test the parts that may fail first. Can your requests move across server instances without losing important context? If they cannot, find the hidden session dependency now.

The stickiness problem was always an infrastructure workaround for something the protocol should have handled. The stateless change is good engineering. Load balancer pain was real, and the architectural fix is the right call. The ten-week window exists so teams can migrate deliberately rather than discover the breakage in production.

Read the release candidate at blog.modelcontextprotocol.io.