Copilot's New Billing Model and the Cheap Model Inside It

On June 1, GitHub ended the Copilot subscription as most teams knew it. All Copilot plans transitioned to usage-based billing, replacing premium request units with a monthly allotment of GitHub AI Credits calculated from token consumption - input, output, and cached - at the listed API rate for each model. Twenty-four hours later, at Build 2026, Microsoft announced the model it clearly had in mind when it designed that pricing structure.

The model is MAI-Code-1-Flash. It is a new Microsoft coding model built for fast, efficient assistance in everyday developer workflows, built end-to-end by Microsoft using clean and appropriately licensed data, and rolling out to GitHub Copilot individual users in Visual Studio Code in the model picker and under the default auto picker.

At just 5B parameters, it achieves 51% on SWE Bench Pro10, putting it closer to Haiku in size but cheaper in cost.

In the new token-based billing, MAI-Code-1-Flash is priced cheaper than Claude Haiku 4.5.

Those two facts, landing 24 hours apart, are not coincidence. They are a coordinated move.

What actually changed on June 1

The old system billed by request. One prompt cost one premium request unit, with a multiplier applied for fancier models. That made a quick chat and a long autonomous agent session functionally the same price, which GitHub eventually admitted was unsustainable.

Copilot has evolved from an in-editor assistant into an agentic platform capable of running long, multi-step coding sessions. Agentic usage is becoming the default, and it brings significantly higher compute and inference demands. A quick chat question and a multi-hour autonomous coding session can cost the user the same amount under the old model - and GitHub absorbed much of the escalating inference cost behind that usage.

Under the new system, when you use Copilot, the interaction consumes tokens - input tokens, output tokens, and cached tokens - and each token is priced based on the model used.

One AI credit equals $0.01 USD.

Copilot code review moved to an agentic architecture that runs on GitHub Actions, so reviewing a pull request with Copilot now counts against included Actions minutes at the same per-minute rates as any other Actions workflow. A PR review can now show up in two places on the bill.

The accounting model underneath has changed completely. Teams that never looked at billing dashboards may be surprised when a familiar workflow suddenly maps to a cost center.

The plan numbers, for reference: paid individual plans include 1,500 credits for Pro, 7,000 for Pro+, and 20,000 for Max.

Business and Enterprise receive pooled monthly allowances of 1,900 and 3,900 credits per user, with temporary promotional allowances of 3,000 and 7,000 running from June 1 to September 1, 2026. That promotional window is deliberate cushioning. GitHub is softening the landing so the first bills don't shock anyone - but watch what happens in October when the flex allotment shrinks back toward the base.

The model arithmetic now matters

Here is the part most coverage skipped. The billing change makes model selection a financial decision, not just a quality one.

GPT-5.5 output costs 24 times what GPT-5.4 nano output costs per token. A careless model picker is now a billing bug. That spread is enormous. A team running an agentic workflow - reading files, writing code, running tests, iterating on a PR - can burn credits at very different rates depending on which model the agent calls.

A Pro user with 1,500 credits can run about 136 medium GPT-5.5 agent steps at 11 credits each before additional usage kicks in. The same credit pool can run about 909 MAI-Code-1-Flash medium steps at 1.65 credits each.

That 6-to-1 ratio in agent steps per dollar is what Microsoft built MAI-Code-1-Flash to exploit. The model is designed for agentic coding in real developer environments, trained on the GitHub Copilot harness. It uses adaptive thinking - staying concise for simple requests and spending more reasoning budget on complex tasks - with strong instruction-following across single-turn and multi-turn scenarios.

This is not a general-purpose model that happens to be cheap. It was trained on the exact workflow it will run inside.

What engineering managers should do this week

Three things, in order of urgency.

Set budget controls before the promotional period ends. User-level budgets are now generally available for organizations and enterprises. Admins can set a universal budget for users or override for specific sets of users, and will receive email notifications as users approach their budgets. The default is open-ended spending. Flip it before September, not after.

Audit which workflows are agent-heavy. Features like agent mode and Copilot cloud agent involve multiple model calls within a single task. A nightly job that runs code review on every open PR, a developer who leaves an agent loop running overnight, or a CI step that calls a frontier model for each changed file - these are the patterns that will surface on the September bill. Find them now.

Take MAI-Code-1-Flash seriously as a default for routine tasks. Most agentic coding work - writing tests, chasing lint errors, resolving merge conflicts, drafting boilerplate - does not require a frontier model. MAI-Code-1-Flash is designed for high-quality coding help with better efficiency, and it outperforms Claude Haiku 4.5 with better price-to-performance across coding benchmarks. For the bulk of agentic sessions, routing to it instead of a frontier model stretches the credit pool considerably.

A tool like Beagle can flag when a Copilot cost alert fires in a Slack channel and surface it in the right engineering thread - but the structural decision is still yours: which model for which task, and what hard limit stops the runaway session.

The broader pattern here is worth naming plainly. This is more than a pricing tweak - it is a signal that agentic software development now has cloud economics attached. Every tool in this space is moving the same direction. Copilot, Cursor, and usage-based Claude Code plans are all converging on metered pricing, so which is cheapest now depends on your usage shape, not the sticker price.

Engineering teams that treated AI coding tools as fixed-cost subscriptions are now operating metered compute infrastructure. The habits that kept AWS bills under control - know your workloads, pick the right instance class, set spend alerts - apply here too.

What actually changed on June 1

The model arithmetic now matters

What engineering managers should do this week

Keep reading

Open SWE: The Internal Coding Agent Architecture That Keeps Converging

Does Claude Sonnet 5 Actually Upgrade Your Agentic Coding?

Stop Treating Agentic Coding as a Writing Problem