Does Your MCP Server Still Need Sticky Sessions?

There is a class of infrastructure workaround that accrues quietly - nobody schedules it, it just gets added during an incident and stays forever. Sticky sessions on MCP load balancers are that workaround for a lot of teams running remote MCP servers. The new spec removes the reason they exist.

The MCP team locked the 2026-07-28 release candidate on May 21. It is the largest revision since launch, and it breaks things. The final spec publishes on July 28, giving SDK maintainers and server authors a ten-week window to validate changes against real workloads. Tier 1 SDKs - the official Anthropic-maintained Python and TypeScript SDKs - are expected to ship support within that window.

What the old design actually forced on you

The big shift is that MCP is going stateless. The current 2025-11-25 spec treats every client-server connection as a session with handshake, identifier, and lifecycle. The new spec rips most of that out at the protocol layer.

The practical pain this caused was straightforward. Two MCP pods behind a round-robin load balancer: pod A handled the initialize and minted an Mcp-Session-Id; the SDK's long-lived GET /mcp for SSE then hashed to pod B, which returned 404. Thirty seconds later the client threw a timeout. The usual fix was LB stickiness on Mcp-Session-Id - it works, but it is an infrastructure workaround for protocol-level state.

Some teams added a Redis-backed session store. Others built a gateway that inspected the request body to extract the session ID before routing. All of these solutions exist because the protocol pushed session affinity onto infrastructure.

What the RC actually changes

The 2026-07-28 release candidate removes the initialize/initialized handshake and the Mcp-Session-Id header that pinned a client to one server instance. Protocol version, client info, and capabilities now travel inline in a _meta field on each request. Any server instance can serve any request, so a remote MCP server can run behind a plain round-robin load balancer without sticky sessions or a shared session store.

The main change is not a new tool field. It is the move to stateless HTTP plus extensions.

That single line from the changelog is worth rereading. Stateless at the protocol layer does not mean your application logic loses context - it means the protocol stops hiding state for you.

A stateless protocol does not require stateless applications. Servers that need to track context across calls can still do it - they just do it the way every HTTP API has always done it: mint an explicit handle, return it from a tool call, and let the model pass it back as an argument on subsequent calls. The spec documents this pattern explicitly. Your application logic does not change; your transport code does.

The extensions framework and long-running tasks

Beyond the session removal, the RC delivers on the 2026 roadmap: a stateless core that scales on ordinary HTTP infrastructure, extensions including server-rendered UIs through MCP Apps and long-running work through the Tasks extension, authorization that aligns more closely with OAuth and OpenID Connect deployments, and a formal deprecation policy so the protocol can evolve without breaking what you have built.

The Tasks extension is worth attention specifically. It reshapes the lifecycle around the stateless model: a server can answer a tools/call with a task handle, and the client drives it with tasks/get, tasks/update, and tasks/cancel. Task creation is server-directed: the client advertises the extension and the server decides when a call should run as a task. tasks/list is removed because it cannot be scoped safely without sessions.

That last sentence matters. The old implicit assumption - that the server knows which tasks belong to which client - breaks down in a stateless world. Explicit handles are not just cleaner code; they are now the only model that works.

There is also a tracing story. W3C Trace Context propagation in _meta is now documented, locking down the traceparent, tracestate, and baggage key names so distributed traces correlate across SDKs and gateways. Several SDKs and tools were already doing this; with the key names fixed in the spec, a trace that starts in a host application can follow a tool call through the client SDK, the MCP server, and whatever the server calls downstream, and show up as a single span tree in an OpenTelemetry-compatible backend.

A teammate like Beagle that lives inside Slack and calls downstream MCP tools benefits from exactly this kind of traceable chain - when a tool call fails in a multi-hop agent workflow, a correlation ID that survives every hop is the difference between a five-minute debug and a two-hour one.

What is deprecated, and what that actually means

Three features enter deprecation: Roots (replaced by tool parameters or config), Sampling (call the LLM provider API directly), and Logging (use stderr or OpenTelemetry). A formal deprecation policy guarantees a minimum twelve-month window between deprecation and removal, so none of these break on July 28.

What to actually do between now and July 28

The migration is not a rewrite. It is mostly a migration of where state lives.

Identify which of your servers hold session state and where you would externalize it. Audit your auth code against the six auth SEPs. Then, once RC SDKs land, rebuild against them and verify that routing headers are sent correctly by whatever clients you target. In July, deploy a stateless variant of any remote MCP server you operate behind a plain HTTP load balancer and measure: does it autoscale? Does a rolling deploy break in-flight tool calls?

The headline change is statelessness, but the more consequential shift is that MCP is making agent workflows more explicit at every layer - handles instead of hidden session state. That is a pattern that pays forward. Every place you previously relied on the protocol to manage context for you is now a place where the model can actually see, reason about, and chain that context between steps.

The more consequential shift is that MCP is making agent workflows more explicit at every layer: handles instead of hidden session state, negotiated extensions instead of guessing capability support, tight authorization semantics instead of OAuth assumptions that break in multi-server topologies. That is a narrower, more honest protocol - and it is what production agent infrastructure actually needs.