Coding-agent cost analysis

The OpenClaw token story is extreme. The workflow lesson is not.

Reports around OpenClaw describe a roughly $1.3M monthly OpenAI token bill across a large Codex agent fleet. Smaller teams will not run that shape of workload, but they can hit the same cost pattern at a smaller scale: repeated context, expensive retries and one premium route doing every job.

Audit my coding-agent routes See sample output

Standing context

Rules, repo summaries, tool instructions and memory can become a fixed cost before the agent does useful work.

Repeated repo reads

A small user request can still ship large file or transcript context repeatedly across agent turns.

Recovery loops

Failed commands, flaky tests and invalid tool calls often trigger expensive retries with more context than needed.

Mixed model roles

Planning, routine edits, recovery and final review rarely deserve the same model, context budget or retry policy.

Smaller-team route map

Turn the headline into an audit checklist

The practical question is not whether your bill looks like OpenClaw. It is whether your agent loop has the same hidden shape.

Context loading

Cache or summarize repo state before the task starts.

Audit this route

Routine implementation

Use a cheaper coding route with a small retry budget.

Audit this route

Failure recovery

Classify the failure first, then send only the minimal evidence.

Audit this route

Final review

Use the strongest route on the final diff and test evidence.

Audit this route

The 5 checks I would run before changing providers

1Pick one expensive coding-agent session and split turns into context, routine edits, command output, recovery and final review.
2Estimate repeated input separately from new user intent; repeated input is often the quiet cost driver.
3Give recovery its own route: failed command, relevant diff and concise logs, not the whole previous transcript.
4Keep the strongest model for judgment-heavy review, architecture, security risk and ambiguous failures.
5Add a hard stop condition before autonomous retries turn uncertainty into spend.

What the 48h audit returns

A route map for your actual workflow: what stays premium, what can move cheaper, where context is leaking, and where retries or fallback policy should change before the next bill.

Preview sample audit

Sources and context

The reported OpenClaw numbers are useful as a stress test, not as a normal buyer benchmark. DealsForge uses them as a prompt to inspect route design, repeated input and retry policy in smaller coding-agent stacks.

Tom's Hardware report Hacker News discussion Reddit r/openclaw discussion