Open Source Internal Coding Agent: What Stripe Built First

A Slack mention. A Linear issue. Twenty minutes later, a draft PR. That is the workflow Stripe, Ramp, and Coinbase each built internally - without sharing code, without a common framework, without coordinating at all.

Over the past year, three of the most engineering-driven companies in fintech built internal coding agents independently. They had no shared codebase, no coordination, and no common framework. Yet they all converged on remarkably similar architectures. That convergence is worth paying attention to, because it suggests these patterns survived contact with production rather than just surviving a demo. Stripe built Minions. Ramp built Inspect. Coinbase built Cloudbot.

On March 17, LangChain open-sourced Open SWE - an MIT-licensed framework that packages those patterns so a team that hasn't spent a year building one doesn't have to start from zero.

What three companies independently decided an internal coding agent needs

Tasks run in dedicated cloud sandboxes with full permissions inside strict boundaries. This isolates the blast radius of any mistake from production systems while allowing agents to execute commands without approval prompts for each action. That second part matters: a coding agent that pauses every thirty seconds to ask permission is not much faster than a developer doing the work.

According to Stripe's engineering team, their agents have access to around 500 tools, but these are carefully selected and maintained rather than accumulated over time. Tool curation appears to matter more than tool quantity. Open SWE ships with about 15 tools covering shell execution, file operations, HTTP requests, and Git - a deliberately small set. More tools expand the surface area where a model can hallucinate the wrong one.

All three systems integrate with Slack as a primary interface, meeting developers in their existing communication workflows rather than requiring context switches to new applications. Ramp's Inspect and Coinbase's Cloudbot both accept Linear comments as a trigger as well. The agent replies in-thread with status updates and PR links. The invocation surface is already where the work gets discussed.

Complex tasks get decomposed and delegated to specialized child agents, each with isolated context and focused responsibilities.

The main agent can fan out independent subtasks to isolated subagents, each with its own middleware stack, todo list, and file operations. This is structurally different from a single-context loop - it lets different parts of a task run in parallel without them stomping on each other's state.

What AGENTS.md actually is

Every coding agent running inside a repo needs the same thing a new engineer needs on day one: project conventions, the right test command, the things that are non-obvious about how this codebase works. AGENTS.md is the file that provides that.

AGENTS.md is a Markdown file placed at the root of a repository that provides AI coding agents with persistent, project-specific operational guidance: build commands, coding conventions, testing rules, and constraints the agent cannot infer from the codebase alone. Open SWE reads it from the sandbox before any work starts and injects it into the system prompt. It is the repo-level equivalent of Stripe's rule files: encoding conventions, testing requirements, and architectural decisions that every agent run should follow.

OpenAI helped pioneer the AGENTS.md format for Codex, and in December 2025, it was donated to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation, alongside Anthropic donating the Model Context Protocol (MCP) and Block donating Goose. The format is now read by Codex, OpenCode, Gemini CLI, Aider, Devin, and several others. It is as close to a cross-tool standard as the space has.

One thing to know before auto-generating one: LLM-generated context files hurt performance. In 5 out of 8 tested settings, LLM-generated files reduced task success rates. Agents took 2.45 to 3.92 additional steps per task, and inference costs increased by 20% to 23%. The file needs to be human-written, short, and kept current. A stale AGENTS.md that describes last quarter's directory structure is actively worse than none.

What Open SWE actually gives you versus what you still build

Open SWE represents a different trajectory than the AI coding tools covered in most comparisons. Claude Code, GitHub Copilot, Gemini Code Assist, and Cursor are products you adopt. Open SWE is a customizable framework.

Open SWE defaults to Claude Opus 4 but supports any LLM provider. Organizations can configure different models for different subtasks.

Every major component is pluggable: you can swap between Modal, Daytona, Runloop, or LangSmith as sandbox providers. You can implement your own sandbox backend if you have internal infrastructure requirements.

The honest gap: validation is the weakest layer. Stripe has three layers of validation plus CI plus automatic retry. Coinbase has agent councils. Open SWE relies on prompt-driven linting and testing and the open_pr_if_needed middleware. For production use, you will likely want to add more here.

Stripe uses forked Goose with AWS EC2 devboxes and three-layer validation. Ramp composed on OpenCode with Modal containers and visual DOM verification. Coinbase built from scratch with agent councils and auto-merge capabilities. Open SWE gives you the skeleton those systems share. The org-specific layer - your internal APIs, your deployment hooks, your security gates - is still yours to write.

The framework represents LangChain's bet on a specific trajectory for AI-assisted development: autonomous, long-running agents that integrate with existing developer workflows rather than requiring new interfaces. This differs from the short, synchronous, in-IDE copilot model that dominated earlier AI coding tools.

That is the real shift. A teammate like Beagle that lives in Slack can surface a task; Open SWE is the thing that goes off and does it. The two ends of the loop - receiving a task and acting on it - are now both addressable with open tooling. What sits between them is still architecture work.

The hype to ignore and the signal to act on

The signal: three independent production deployments converging on the same four decisions is a meaningful prior. Isolated sandboxes, curated toolsets, async invocation from Slack or Linear, and subagent orchestration for parallel work - these are not theoretical. They are the things that stayed after everything else was cut.

The hype: "your internal coding agent in ten minutes." Not every task needs an autonomous agent. For small one-liner bug fixes or simple style updates, this architecture is not optimal. Spinning up a cloud sandbox and running a subagent graph to fix a two-line typo is overhead you do not need. The architecture earns its cost on tasks with real scope: multi-file changes, cross-service work, tickets that would otherwise sit in a queue for two days.

The hardest problem in agentic development is not the AI models - it is giving those models a safe, fast, isolated place to execute. Open SWE solves that problem in a way you can inspect, fork, and extend. That is worth more than the GitHub star count.

Open SWE is MIT-licensed and available at github.com/langchain-ai/open-swe. A hosted version is at swe.langchain.com and requires an Anthropic API key. The AGENTS.md format spec lives at agents.md.

What three companies independently decided an internal coding agent needs

What AGENTS.md actually is

What Open SWE actually gives you versus what you still build

The hype to ignore and the signal to act on

Keep reading

Open Source AI Coding Agent OpenCode Hits 172k Stars as Copilot Bills by Token

Open SWE Packages What Stripe, Ramp, and Coinbase Already Built

Stop Picking AI Models From the Top of a Leaderboard