The moment a coding agent writes a Python script and executes it, something quiet happens in the infrastructure underneath. The code doesn't run on your laptop, and it doesn't run in a Docker container sharing a kernel with hundreds of other processes. It boots a throwaway virtual machine, runs your script, and destroys itself.
That boundary is the whole point. Understanding how it works changes how you think about what agents can and can't safely do.
The problem with containers
Most developers are comfortable with Docker. You build an image, run a container, and your application is isolated from the host - sort of.
Standard Docker containers share the host Linux kernel. A sophisticated zero-day kernel exploit inside a container can escape to the host. For an agent running human-authored code that's been reviewed, that's probably fine. But a coding agent sandbox faces a different threat model: the code being executed was not written by a human, cannot be fully reviewed before running, and may attempt actions that are destructive, resource-intensive, or insecure.
That's why the serious agent infrastructure providers don't use containers as the primary isolation boundary.
What a microVM actually is
A microVM is a minimalist virtual machine designed to provide the security and strong isolation of VMs, while having an almost negligible startup time and providing the same resource efficiency as containers.
The key property is kernel ownership. Each microVM runs its own kernel and does not share kernel structures with others, unlike containers. That means a kernel-level exploit inside the sandbox cannot reach the host or any neighboring sandbox.
The technology that made this practical at scale is Firecracker. Firecracker was developed at Amazon Web Services to improve the customer experience of services like AWS Lambda and AWS Fargate. It is a virtual machine monitor (VMM) that uses the Linux Kernel-based Virtual Machine (KVM) to create and manage microVMs.
What makes Firecracker fast is what it leaves out. Firecracker doesn't implement traditional devices like a BIOS or PCI bus and instead communicates with the guest kernel through optimized virtio interfaces. Where a typical virtualization environment simulates the behavior of the machine that a program thinks it's running on, with virtio, the program knows that it's running on a simulation. Stripping the device model is also why the codebase stays small: QEMU has more than 1.4 million lines of code. Firecracker's stripped-down VMM has only 50,000 lines of code
- a much smaller attack surface.
It boots microVMs in as little as 125 milliseconds, consumes less than 5 MiB of memory overhead per VM, and supports creating up to 150 microVMs per second on a single host.
The full isolation stack
Firecracker itself is one layer. There's more beneath it.
Firecracker uses a Jailer component which applies seccomp-bpf filters, cgroups, and chroot isolation to the VM process itself to further reduce the potential impact of a compromise. The guest kernel and root filesystem are immutable - the hypervisor makes sure that they are read-only, drastically reducing the risk of runtime tampering.
So the full stack looks roughly like this:
- KVM at the hardware level handles CPU and memory virtualization
- Firecracker VMM creates and manages the microVM, with a minimal device model
- Jailer wraps the VMM process in cgroups, seccomp filters, and a chroot
- Guest kernel runs inside the VM, isolated from every other guest kernel on the host
- Your agent's code runs as a process inside that guest kernel
Each layer assumes the one below it may be compromised and adds its own restriction.
A concrete example
Say a software agent is given a task: analyze a CSV file and write a test suite for the transformation function. The agent thinks for a moment, writes 40 lines of Python, and calls a tool named run_code.
At that point, a platform like E2B receives the request. Under the hood, E2B uses Firecracker microVMs - ephemeral, lightweight virtual machines originally developed by AWS. These VMs serve as whole virtual computers for the agent. The microVM boots in roughly 150 milliseconds. Each sandbox gets its own kernel and network namespace, so a guest kernel vulnerability cannot reach the host.
The Python runs. The test output comes back. The VM is destroyed.
Nothing about the agent's code execution touched the host filesystem, the host network stack, or any other tenant's sandbox. If the agent had accidentally written a fork bomb or tried to exfiltrate credentials via an outbound HTTP call, the blast radius would be contained to that one microVM for its lifetime.
Ephemeral is a security property, not just a cost optimization.
The stateful problem
The clean model - boot a fresh VM per request - breaks down when an agent needs to work across multiple turns on the same codebase. An agent might install dependencies on turn three and need them on turn seven.
Multi-turn agent sessions need filesystem state that persists across turns. An agent working on a Python project across ten turns has installed packages, written files, and accumulated intermediate outputs. Full sandbox re-initialization on every turn wastes 200-500ms on environment setup. Firecracker's snapshot-restore mechanism lets you pause a sandbox, preserve its memory and filesystem state, and resume it in 5-30ms.
This is the operational foundation behind agents that feel like they have a persistent workspace. The sandbox isn't actually running between turns; it's been checkpointed and thawed.
The tradeoff between ephemeral and stateful sandboxes captures a core tension in the space. E2B assumes agents should start fresh every time, which is the cleanest security model because nothing persists and nothing can accumulate. Other platforms assume agents need continuity, which is more practical for coding workflows but means you inherit the complexity of managing long-lived state.
What this means for teams building on top of agents
Network isolation matters as much as filesystem isolation. An agent that can install arbitrary packages and make outbound network calls can be tricked into exfiltrating data - it doesn't need kernel access to cause damage.
NVIDIA's own security guidance for agentic workflows states that "application-level controls are insufficient" because once control passes to a subprocess, the application loses visibility. They recommend fully virtualized environments and emphasize that each dangerous action should require fresh user confirmation.
The other practical concern is cold start latency. AI agents work at the speed of API calls. If a sandbox takes 10 seconds to start, your agent workflow bottleneck shifts from model inference to infrastructure. Pre-warmed sandbox pools solve this at the cost of idle resource spend.
An AI teammate like Beagle operates inside your Slack or Teams workspace, where the threat model is different - it's reading messages, not executing arbitrary code. But the general principle holds: the right question to ask of any AI system is not "what can it do?" but "what can it reach if something goes wrong?"
The sandbox is the answer to the second question.