Treat Your CLAUDE.md Like a Production Config File

Two weeks ago Salesforce published what actually happened when it went all-in on agentic coding - not a demo, not a lab benchmark. A production engineering org at scale.

The headline numbers are hard to ignore. In April 2026, work items completed per developer were up 50.8% year over year. PRs merged per developer climbed 79%. And the company's ML-based Effective Output score - measuring value shipped, not volume - grew 151.3%.

None of those figures can be independently verified. Worth naming, and worth reading past anyway, because the mechanism is what's interesting.

The productivity gains didn't come from the model. They came from removing friction around the model.

The inflection point was a deliberate org-wide pivot to Claude Code as the primary agent, rolled out to all engineers - and then Salesforce removed all token limits. The stated goal: eliminate every piece of friction between engineers and the tools that make them faster. That's a governance decision, not a technical one. Most teams do the opposite: they grant access, leave the defaults tight, and wonder why adoption stalls.

The migration that actually tells the story

A product team needed to migrate 33 API endpoints to a new cloud-native architecture - a job the company estimates would have taken 231 person-days the traditional way. Using a rule-based framework built on Claude with markdown files and reference implementations, they completed it in 13 days.

That's an 18x compression. How?

Every round of PR feedback got folded back into the rule set, so accuracy improved continuously and outputs arrived near production-ready. The team let autonomous LLM loops run the build-fix-validate cycle without manual intervention, and parallelized migrations across isolated environments to generate multiple PRs simultaneously.

The largest single PR delivered 21 endpoints with 100% test coverage. That's not vibes coding. That's a structured handoff with a reviewable artifact at the end.

What actually broke (and what didn't)

The skeptic's question is always: what's the quality tax?

Salesforce reported a 151% increase in effective output alongside a 5% drop in incidents. More PRs, fewer fires. The honest caveat from the engineering lead is that this held because of the governance scaffolding, not despite it.

He's upfront about what's still genuinely hard: context management in long agentic sessions is a skill engineers are still developing. The quality of CLAUDE.md files - persistent context configs that align Claude to a codebase - varies widely between teams, and that variance has a big impact on output.

Security also needs a rethink. When agents act on systems rather than just suggesting, the blast radius of a misconfigured tool gets much larger. Most teams running agentic workflows are still treating permissions the way they treat local scripts - loosely. That's not viable when the agent can push a PR.

Strong agentic workflows share three traits: tasks are written like issues, not casual chats - with background, goal, acceptance criteria, and test commands explicit. Permissions are managed like production systems, not local scripts: least privilege, short-lived tokens, sandboxes, and audit logs. Results are reviewed like PRs from a junior engineer, not accepted as authoritative answers.

The organizational question nobody has answered

Then there's the talent pipeline problem. When agents handle more of the execution layer, how do junior engineers grow into senior engineers if AI is absorbing much of the entry-level work? What is the role of a designer or product manager in this new world? Salesforce is experimenting with one- and three-person units instead of traditional Scrum teams, but they're clear this is still early.

That's a real question, not a rhetorical one. The teams that are thinking about it now will be better positioned than those that defer it until the org chart has already shifted.

At Salesforce, Claude Code skills - packaged, reusable capabilities that encode team context, naming conventions, and workflow patterns - have become a new form of engineering artifact. Teams are building them, sharing them, and compounding on each other's work. That's the shift worth watching: not the headline velocity numbers, but the organizational muscle that produces them. An AI teammate like Beagle lives inside the same Slack threads where that institutional knowledge gets built and lost - and the teams that surface and encode that context, wherever it lives, are the ones that make the agent work.

The Salesforce post is worth reading in full. The numbers are the hook. The part about what's still hard is the actual signal.

The migration that actually tells the story

What actually broke (and what didn't)

The organizational question nobody has answered

Keep reading

Stop Measuring Agentic Coding Tools by Lines of Code

Async Coding Agents and When to Actually Use One

Cursor in Slack Now Plans Before It Codes