Dynamic workflows made it easy to attack my own design before building on it

Last week Claude Code shipped dynamic workflows. The first thing I pointed one at was the design decision I was least sure of, and verifying it took a prompt instead of a harness.

Dynamic workflows are easy to underrate. Claude Code can now write its own multi-agent harness on the fly, custom-built for the task in front of it, rather than routing everything through one general-purpose harness. The unlock is not that Claude can run many agents; it could already do that if you built the orchestration yourself. The unlock is that you no longer have to build it. You describe the shape of the work and Claude writes the harness to match.

That changes which tasks are worth verifying properly, which is where this story starts.

The decision I wanted to verify

I was architecting an agentic memory system, where an orchestrator routes every piece of incoming knowledge to the store where it belongs. Some knowledge belongs in agentic memory, the layer agents reason over and treat as live context. Some belongs in a conventional table, looked up deterministically and never interpreted. Route it to the wrong store and every agent that depends on the memory inherits the error, silently, at every turn.

That routing logic rested on a baseline I had built by hand: a trace of where each case lands today and which ones are misrouted. The baseline was one perspective, and several of its verdicts were judgment calls rather than facts, especially the cases where the routing hinged on a model's classification and was nondeterministic by construction. Building on a verdict that is wrong is the kind of mistake you do not discover until the agents are already relying on it.

Why this used to be skipped

The honest reason a check like this normally does not happen is cost. Standing up an independent multi-agent verification used to mean building a static orchestration yourself, with the Agent SDK or a claude -p script, and then maintaining it. A static harness has to cover every case, so it ends up generic, and for a one-off design decision the overhead rarely justified itself. So you build on the single-analyst baseline and hope the judgment calls were right.

Dynamic workflows remove that cost. I described the verification I wanted in plain language and Claude wrote a harness tailored to it, with no scaffolding for me to maintain afterward.

What the workflow did

Claude built the check as two phases, each a fan-out of parallel agents with its own clean context, joined at a barrier that merged their structured output.

The first phase was blind independent triangulation. Several agents traced the same corpus from scratch, each owning a slice, none of them shown my baseline verdicts. Because they had not seen the answer, agreement meant genuine triangulation rather than my own conclusion echoed back, and disagreement was just as useful, because it flagged the cases that were nondeterministic in the first place.

The second phase was adversarial refutation. Each headline conclusion went to a separate agent whose only instruction was to break it: find evidence that contradicts this, and report that you cannot only after a real search. The default was refuted, not confirmed. An agent told to attack a claim looks for the reason it is false, which is the search that tests the design; a confirming reader looks for reasons it is true and finds them. Every finding had to cite a specific location, so a confident-but-wrong agent could not hide behind plausible prose.

None of this is novel as a method. What was new is that I did not build any of it. The feature exists to counter the failure modes of long single-context work, and the one that mattered here is self-preferential bias, the tendency to prefer and defend your own findings when you grade them. Blinding the tracers and pointing the refuters at each claim is the structural fix, and Claude assembled it from a description.

The payoff is correction, not confirmation

The structural conclusions survived the attack, which was reassuring and almost beside the point. The value was in the corrections.

The refuters overturned several of my own verdicts, specifically the ones where I had been too harsh and labeled a defensible design decision a defect. The routing I had flagged as a bug went through a model whose prompt deliberately rejected the category I assumed it would land in, so the likely outcome was the correct one. The agents also surfaced bounds I had missed, including a store that was not merely unreachable but dead: rendered in the interface, allowed by the schema, and written by nothing. I went into implementation with a baseline I trusted because it had been attacked, not because I had asserted it.

Use it where it changes the answer

Dynamic workflows cost more tokens than a single agent and are meant for complex, high-value tasks. You can cap one with a token budget or ask for a quick version, but the real discipline is knowing when the shape fits. I used a workflow once, for the verification, and kept the build itself single-agent and direct. Implementation and debugging are linear and stateful: write a function, run the check, fix the error, run again. There is nothing to fan out, and the bugs surfaced one at a time from real inspection. More agents would have added coordination cost for no gain. Matching the machinery to the task is the point, and dynamic workflows make that match cheap to strike.

The harness became programmable

The deeper shift is that the harness, the scaffolding around the model, used to be the slow part to build and the thing you reused precisely because it was expensive to make. When the harness becomes something Claude writes per task, the cost of a custom verification, a custom research pass, or a custom refactor falls toward the cost of describing it. The highest-leverage place to point that capability is not generating more; generation is already cheap. It is verifying the decisions your agents will depend on, before you build on them, while changing your mind is still free.

#DynamicWorkflows #AgenticAI #ClaudeCode #AIEngineering #LLMOps #AIStrategy

Dynamic workflows made it easy to attack my own design before building on it

The decision I wanted to verify

Why this used to be skipped

What the workflow did

The payoff is correction, not confirmation

Use it where it changes the answer

The harness became programmable

Read more

Two Clocks

Entrench What Compounds, Rent What Decays

The four things you know about your own code

Velocity decay isn't a bug