Coding-agent cost analysis

The OpenClaw token story is extreme. The workflow lesson is not.

Reports around OpenClaw describe a roughly $1.3M monthly OpenAI token bill across a large Codex agent fleet. Smaller teams will not run that shape of workload, but they can hit the same cost pattern at a smaller scale: repeated context, expensive retries and one premium route doing every job.

Standing context

Rules, repo summaries, tool instructions and memory can become a fixed cost before the agent does useful work.

Repeated repo reads

A small user request can still ship large file or transcript context repeatedly across agent turns.

Recovery loops

Failed commands, flaky tests and invalid tool calls often trigger expensive retries with more context than needed.

Mixed model roles

Planning, routine edits, recovery and final review rarely deserve the same model, context budget or retry policy.

Smaller-team route map

Turn the headline into an audit checklist

The practical question is not whether your bill looks like OpenClaw. It is whether your agent loop has the same hidden shape.

Context loading
Cache or summarize repo state before the task starts.
Audit this route
Routine implementation
Use a cheaper coding route with a small retry budget.
Audit this route
Failure recovery
Classify the failure first, then send only the minimal evidence.
Audit this route
Final review
Use the strongest route on the final diff and test evidence.
Audit this route

The 5 checks I would run before changing providers

  1. 1Pick one expensive coding-agent session and split turns into context, routine edits, command output, recovery and final review.
  2. 2Estimate repeated input separately from new user intent; repeated input is often the quiet cost driver.
  3. 3Give recovery its own route: failed command, relevant diff and concise logs, not the whole previous transcript.
  4. 4Keep the strongest model for judgment-heavy review, architecture, security risk and ambiguous failures.
  5. 5Add a hard stop condition before autonomous retries turn uncertainty into spend.

Check whether your agent loop is audit-worthy

No credentials. Send just enough to know if a 100 EUR audit can find a practical route.

What the 48h audit returns

A route map for your actual workflow: what stays premium, what can move cheaper, where context is leaking, and where retries or fallback policy should change before the next bill.

Preview sample audit

Sources and context

The reported OpenClaw numbers are useful as a stress test, not as a normal buyer benchmark. DealsForge uses them as a prompt to inspect route design, repeated input and retry policy in smaller coding-agent stacks.