When Your AI Agent Bills by the Task, What Does a Task Actually Cost?

Two days ago, Microsoft flipped Copilot Cowork to general availability worldwide. Three months of Frontier preview, more than half of the Fortune 500 using it during that period , and now billing has started for everyone else. Billing for Copilot Cowork began June 16. That's the actual news. Not the features - the pricing model underneath them.

This matters beyond Microsoft shops, because Cowork is the first large-scale example of something the whole industry is moving toward: agents that don't cost a flat monthly seat fee, but a running meter on every task they touch.

The model has changed

Until now, enterprise AI has been priced like SaaS. You buy a seat, users get access, usage is effectively unlimited within reason. Copilot Cowork breaks that contract. Cowork requires a Microsoft 365 Copilot user subscription license, but usage is billed separately through Copilot Credits. Task pricing is based on model use, context retrieval, tool calls, and runtime.

Metered usage-based AI agents behave like cloud compute. The more work you assign, the more you pay, and the cost scales with how hard each task runs.

That's a genuinely different thing to budget for. A seat license is a known line item at renewal. An agent that bills by what it does is a variable cost that can surprise you on a Tuesday morning.

What you're actually paying for

Each task's credit cost comes from four things: the models it runs, the context it pulls in, the tools it calls, and the runtime it needs. That means the same prompt can cost very different amounts depending on whether the agent has to dig through a large SharePoint corpus, call several plugins, or run for twelve minutes instead of thirty seconds.

The billing unit isn't "a question answered." It's a workflow executed - and workflows are not uniform.

At launch, Copilot Cowork runs on Anthropic models, including Opus 4.8 and Sonnet 4.6. Microsoft said GPT 5.5 is available through Frontier, while a new Cowork 1 model is expected in the coming weeks. That model picker is also a cost lever. At GA, Cowork runs on Anthropic Opus 4.8 and Sonnet 4.6. In Frontier, you can also use GPT 5.5, and pick the model per task with the model picker to manage cost. Cowork 1, Microsoft's own fine-tuned model, is coming in the weeks after GA - built to handle everyday tasks at substantially lower cost.

In other words: knowing which model you're routing a task to is now a financial decision, not just a quality one.

The governance gap this creates

A Logicalis 2026 Global CIO Report found that 62% of CIOs have compromised on AI governance because of limited knowledge. Yet 94% increased AI investment over the same year. That gap is tolerable when the cost is a flat annual line. It's not tolerable when a runaway agent accumulates credits the way a misconfigured cloud job burns compute.

Microsoft built guardrails into the GA release. Microsoft said administrators will be able to control when Cowork is enabled, who can access it, and how much can be spent. Cowork is "off by default," with admins able to set spending limits at the tenant, group, and user levels. There's also a Cost Management Dashboard in the admin center to set policies and monitor usage trends.

That's the right architecture. But it only helps if someone actually sets those limits before a team starts assigning heavy tasks. In most organizations right now, nobody owns that yet.

Cowork proposes; it does not silently act. It surfaces record updates, drafted emails, and case resolutions for approve, edit, or dismiss. That human-in-the-loop design limits some blast radius. But it doesn't limit the credit burn - a task that runs for several minutes and retrieves across dozens of documents costs the same whether the human ultimately approves or cancels it.

What this means for every team, not just Microsoft shops

Cowork is the first at this scale, but the pattern is already spreading. The bigger shift is commercial. Cowork is billed as a pay-as-you-go service through Copilot Credits, not bundled into the flat per-seat license. Anthropic, OpenAI, and others are all watching this rollout closely, because usage-based agentic billing is where the industry is heading regardless of vendor.

The practical implication for any team evaluating agents right now: the question "which agent should we use?" is incomplete without "and what's the cost per workflow, at our volume, with our data?" A well-scoped task that a teammate like Beagle runs in a Slack thread might cost a fraction of a heavy document synthesis job - but you won't know until you measure both.

As organizations move from pilots to production, success depends less on which model tops a benchmark and more on governance, oversight, and thoughtful design. That was true before. It's truer now that the meter is running.

The model has changed

What you're actually paying for

The governance gap this creates

What this means for every team, not just Microsoft shops

Point me at your website.

Keep reading

The New Hire Slacks "Who Do I Ask?" on Day 3

Does a Cheaper Model Actually Run Agents Better?

AI Coding Benchmark Leaderboards Measure the Wrong Thing