AI coding cost worksheet

Measure AI coding agents by cost per accepted change.

Claude Code, Codex and router debates often compare subscriptions, tokens or benchmarks. For a real team, the better unit is the accepted change: the patch, config or bug fix that actually survives review.

Start with the 4-line brief

No keys, no repo access. A rough paragraph is fine; we just need enough context to decide whether the 100 EUR audit has a practical route to test.

1. providers/models:
2. rough monthly spend or token volume:
3. workflow type:
4. one expensive session or failure loop:
Email brief

Repo scan

Files, project rules, memories and broad searches loaded before the first useful edit.

Audit bucket

Planning

Architecture, task decomposition, risk calls and route choices that deserve higher reasoning.

Audit bucket

Edits

Mechanical implementation and patching work that may not need the strongest model every time.

Audit bucket

Output and retries

Terminal logs, failing commands and recovery loops that quietly become the expensive workload.

Audit bucket

Final review

Diff review, tests, security, behavior and merge confidence after the code is changed.

Audit bucket

Accepted-change worksheet

Task

One merged PR, accepted patch, shipped config, or resolved bug.

Human baseline

How long the same change would take without an agent.

Agent route

Claude Code, Codex, OpenRouter, local model, router, or mixed workflow.

Session buckets

Repo scan, planning, edits, output/retries, final review.

Accepted outcome

Merged, reverted, partially accepted, or abandoned.

True cost

Subscription pressure, API spend, retries, waiting time and human cleanup.

If you cannot fill this table for one task, switching brands is premature. First measure where the session actually spends value.

Check one expensive coding-agent session

No credentials. Send just enough to know if a 100 EUR audit can find a practical route.

Decision rule

Route the expensive bucket, not the entire workflow.

A plan upgrade can be rational if planning and review are the bottleneck. A router, cheaper model or local fallback can be rational if scan, mechanical edits or retries are the leak. The audit is deciding which is true from one real workflow.

Why measure cost per accepted change instead of tokens?

Tokens matter, but coding-agent work is judged by accepted code. A cheap session that produces no usable patch can be more expensive than a premium review that prevents a bad merge.

Does this replace Claude Code or Codex benchmarking?

No. It gives a workflow-level accounting layer. You still compare tools, but against the same real task buckets: scan, plan, edit, retry and review.

When should I switch models or providers?

Switch the bucket first. If repo scan or retries dominate, reduce context and route routine work cheaper. If planning or review dominates, the premium model may still be the right route.

When is a paid routing audit useful?

It is useful when a team has repeated quota pain, a visible LLM bill, unclear router choices, or expensive agent loops and needs a concrete routing policy for real workflows.

When this becomes a 100 EUR audit

If the accepted-change worksheet shows a real cost, quota or retry problem, send the 4-line brief. We turn it into a 48h route map: what stays premium, what moves cheaper, what context to stop sending and what fallback to test.

Start 4-line brief