Treat Every Line Your Agent Writes as Untrusted Code

Picture a coding agent you've asked to analyze a CSV and write a summary. It generates a Python script, runs it, and hands you the result. That feels routine. But that script was not written by an engineer, it was not reviewed, and it executed on real infrastructure. If the model was fed a malicious document, or simply hallucinated a destructive command, your host system had no say in the matter.

That gap - between "the model produced code" and "the code ran safely" - is exactly what agent sandboxing exists to close.

Why the old isolation model breaks

Security for language models was quite straightforward when the models were entirely passive. A user sent a prompt, the model predicted tokens, it responded with text. There were no actions, and the worst outcome was bad advice. Now, models have tools, code interpreters, and true agency.

That shift changes the threat model entirely. A coding agent sandbox is designed around a specific threat model: the code being executed was not written by a human, cannot be fully reviewed before running, and may attempt actions that are destructive, resource-intensive, or insecure.

The instinct most teams reach for first is a Docker container. It is fast, familiar, and feels isolated. The problem is structural. Docker containers share the host OS kernel. A container escape - formally classified as MITRE ATT&CK technique T1611 - does not give an attacker access to a container. It gives them access to your host.

Containers separate processes using Linux namespaces and cgroups. Fast and lightweight, but a kernel vulnerability in one container can compromise all others. Sufficient for trusted, internally-written code. Insufficient for anything an LLM generates.

This is not theoretical. Langflow CVE-2025-3248 (CVSS 9.8, added to the CISA Known Exploited Vulnerabilities catalogue) and the Cursor MCP RCE are not hypothetical future scenarios - they happened to real teams running the same tools engineers are probably using right now.

The core distinction: containers treat the isolation boundary as a resource partition. Sandboxing for agent workloads must treat it as adversarial.

What real isolation actually looks like

Production-safe agent execution requires hardware-level isolation - microVMs or userspace kernels - default-deny filesystem and network policies, and layered escape prevention.

Three technologies dominate the current landscape, each with different trade-offs:

Firecracker microVMs. Firecracker is a microVM designed by AWS for Lambda and Fargate, built around one core principle: minimal attack surface.

Its stripped-down VMM has only 50,000 lines of code - a 96% reduction over QEMU. The result: it boots microVMs in as little as 125 milliseconds, consumes less than 5 MiB of memory overhead per VM, and supports creating up to 150 microVMs per second on a single host. Each VM gets its own kernel. A compromise stays in that kernel.

gVisor. Rather than a full VM, gVisor interposes a user-space kernel between the application and the host. Google built gVisor for Search and Gmail. It catches syscalls before they reach the real kernel, so even if the agent-generated code tries something destructive, it is hitting a synthetic kernel that has no direct path to the host.

Kata Containers. A hybrid that wraps OCI containers inside lightweight VMs using hardware virtualization. Northflank processes over 2 million isolated workloads monthly using Kata Containers and gVisor.

AWS built Firecracker for Lambda, Google built gVisor for Search and Gmail, Azure uses Hyper-V for ephemeral agent sandboxes. Every one of them reached for their strongest isolation primitive and pointed it at AI. None of them reached for containers.

The cold-start problem

Isolation costs latency, and latency matters when an agent is running a quick calculation inside a chat interface. This is a real engineering problem, and it has a real solution: VM snapshots.

Firecracker supports snapshotting a running VM's complete state to disk - full memory contents, CPU register state. When you restore from a snapshot, Firecracker doesn't boot a kernel, doesn't run init, doesn't start your agent. It memory-maps the snapshot file, loads the CPU state, and resumes execution from exactly where it left off. The VM doesn't know it was ever stopped. From the guest's perspective, time just skipped forward.

That technique gets cold starts from roughly a second down to tens of milliseconds - fast enough that users do not notice it in a chat interface.

The hardest problem in sandboxing isn't virtualization; it's data movement. To achieve sub-second start times for AI agents, you can't run docker pull inside a microVM. You need a sophisticated block-level caching strategy. When hundreds of agent sessions spin up at once, naively pulling container images to each VM will saturate your network.

What the sandbox does not protect

A strong execution boundary is necessary but not sufficient. Application-level guardrails, prompt filtering, and semantic monitoring are valuable, but they operate after the agent has been handed execution.

There is also the credential problem. If your agent has access to a database password via an environment variable, a compromised sandbox still has that password. A proxy service that handles authentication - verifying operations and applying real credentials on the host side - means that even if the sandbox is compromised, the attacker cannot access the credentials.

The sandbox is not merely a security fence. It is a cognitive boundary that simplifies the agent's operating environment by removing irrelevant state, restricting dangerous actions, and making the workspace inspectable. That last property matters more than it seems. When an agent fails in a fully isolated environment with its own filesystem snapshot, the failure is reproducible and diagnosable. You can replay it without worrying that the replay touches production data.

If you're running code your own engineers wrote that passed CI/CD, standard containers are probably fine. If your agents generate code from user prompts, the risk goes up significantly - you need strong isolation, either gVisor or microVMs, to prevent accidental resource exhaustion or worse.

The concrete picture

Return to the CSV analysis agent. With proper sandboxing, the flow looks like this: the orchestrator receives the user's request, provisions a fresh microVM from a pre-warmed snapshot, mounts a read-only copy of the data file, grants the VM no outbound network access, runs the generated Python inside it, captures the stdout, then destroys the VM. The host never sees the Python. The VM never sees the host filesystem. If the model hallucinated a subprocess.run("rm -rf /"), that command runs inside a kernel that will be discarded in milliseconds.

An AI teammate like Beagle operates in a similar posture - calling tools on your behalf through narrow, scoped interfaces rather than as a process with ambient authority over your workspace.

The scale of the problem is already large: E2B alone went from 40,000 sandbox sessions per month in March 2024 to roughly 15 million per month by March 2025. Agent-generated code execution is not a niche scenario. It is becoming the default. The isolation layer underneath it deserves the same care as the model on top.