As generation gets cheap, value moves to verification

Share

Anthropic's CEO used part of his own developer conference to explain why writing the code is no longer the hard part.

His framing was Amdahl's Law: when you speed code generation up several times over, the work you did not speed up becomes the constraint. Review, security, and verifying the output is correct turn into the bottleneck that holds everything back. That is a striking thing to hear from the company whose business is selling you the model.

I have been digesting dozens of the technical talks from that conference, and this is the throughline across nearly all of them. The model has become the cheapest, most swappable part of the stack, and durable value is moving to everything around it.

The scaffolding you write to prop a model up is depreciating. Vercel's Guillermo Rauch described deleting a production auto-fix pipeline once a newer model made it redundant. monday.com found that moving between two Opus versions broke every system prompt they had tuned; a model upgrade is now a migration, not a drop-in. Reliability code has a half-life of months. The integration and data wiring that connects a model to your business compounds instead. Build the half the model cannot absorb.

Generation is no longer the constraint. Trust is. On the Bun runtime, an AI bot is now a larger contributor to the main branch than its creator, and the team raised the merge bar in response, because rejecting an agent's pull request carries no social cost. One enterprise running agents at production scale merges around 173 AI-written pull requests a day, and reached roughly 85 percent acceptance not with a bigger model but with a council of different models reviewing each other. That same enterprise cannot upgrade the model writing those PRs, because it lacks the evaluation setup to confirm a newer one is safe. The most advanced adopters are constrained by verification, not capability.

Durable advantage is moving to what accumulates. Doctolib reached near-total adoption through a skills marketplace and a curated environment, not a better model. Asana locates its moat in shared memory and multiplayer coordination. The regulated players say it in their own vocabulary: financial crime, quant signals, and legal work all point to governance and shared context as the unlock. Anthropic is absorbing the runtime itself through managed agents and persistent memory, which pushes the contested ground further down, into the curated memory layer that improves with use.

One caveat I keep in front of me: almost every production number above came from a vendor or its customer on the vendor's own stage. There is still almost no independent measurement of how these systems perform once they ship. The same showcase reporting high acceptance rates also included a team admitting its evaluations are still closer to vibes than method.

So the call, for anyone building or funding this: stop optimizing the generator. Put the budget on verification, context, and governance, and treat the model as something you migrate, not something you depend on. As generation gets cheap, value moves to whatever stays scarce. Right now that is the ability to trust the output, and the judgment about what not to build.


#AgenticAI #AIStrategy #AIEngineering #AIEvals #LLMOps

Read more