Almost everyone working on agents is trying to make them smarter. I suspect the bigger problem is making them easier to challenge.
Humans and agents fail in remarkably similar ways. We stop running checks. Not because verification becomes impossible, but because it becomes expensive enough that we begin rationing it. At first we skip a few checks because they seem unnecessary. Later we skip them because there is no time. Eventually we forget which assumptions were verified and which were merely inherited from earlier assumptions.
A system stays grounded when its claims are cheap to verify. It drifts when they are not.
This reaches far beyond software. Science works because experiments can be repeated. Markets work because prices can be compared. Democracies work because claims can be challenged in public. Software works because tests can be rerun. Agents work because their outputs can be checked against something outside themselves.
Different domains, same mechanism. Cheap verification keeps a system tied to reality. Expensive verification cuts the line so gradually that the drift often goes unnoticed until something important breaks.
When that happens, we usually blame the wrong thing. We reach for explanations involving intelligence, discipline, incentives, or process. Sometimes those are the cause. More often the explanation is simpler. Verification became expensive, so people stopped doing it.
Trust filled the space where verification used to happen.
I think trust is best understood as a loan against a check you skipped. Most of the time that loan works out fine. The problem is that it compounds. One unverified claim becomes the basis for another, which becomes the basis for another, until the original assumption disappears beneath layers of reasoning that nobody remembers validating.
Seen through that lens, many institutions are really systems for converting trust back into verification. Audits exist because trust eventually expires. Peer review exists because expertise alone is not enough. Courts exist because disagreements ultimately need evidence. Monitoring exists because systems rarely stay healthy simply because we believe they are.
The same pattern is appearing in agentic development, only faster.
An agent can produce claims at machine speed. It can generate code, summaries, explanations, plans, and analyses faster than any human. The half of the loop responsible for producing claims accelerated dramatically. The half responsible for verifying them barely moved.
So we started rationing checks.
We skim diffs. We trust passing indicators we did not investigate. We merge code we understand less than we would like to admit. The result is familiar. The codebase slowly becomes something we own without fully understanding. I have called this drift before. Agents simply accelerate the process.
The common response is to focus on making the agent smarter. I am not convinced that attacks the actual constraint.
A smarter agent produces more plausible claims. It produces them more quickly and with greater confidence. If verification remains expensive, the gap between what is asserted and what is verified only grows wider. Intelligence was never the bottleneck. The bottleneck was the cost of challenging the result.
That becomes obvious when the economics change.
Imagine a build that used to take nine minutes now returning in under thirty seconds. The immediate benefit is obvious: less waiting. The more interesting change is behavioral.
You stop batching changes together. You run checks after smaller steps. You become more willing to validate assumptions because doing so no longer interrupts your flow. Agents can verify their own work more frequently as well, grounding each step against reality before building on it.
The people did not change. The agent did not change. The economics did.
That alone is often enough.
One caution is worth mentioning because it determines whether any of this helps. Cheap verification only matters if the verification itself is meaningful. A fast test that proves nothing can be more dangerous than a slow test that proves something. It creates confidence without creating knowledge.
The goal is not simply to make verification cheap. The goal is to make meaningful verification cheap.
This is one reason I find build infrastructure more interesting than it first appears.
Much of the discussion focuses on speed. Faster builds, faster tests, faster feedback. Those things matter, but mostly because they change behavior.
When verification becomes cheap enough to feel free, people and agents naturally do more of it. The speedup is visible. The behavioral shift is harder to measure, but far more important.
In software, we already know how to verify many claims. Run the build. Run the tests. Deploy into something resembling reality. The challenge is rarely that we lack a mechanism for verification. The challenge is deciding whether the answer is worth waiting for.
Once verification becomes cheap enough, that calculation changes. Running a check stops feeling like a ceremony and starts feeling like the obvious next step. The feedback loop tightens. Assumptions give way to evidence.
Which leads to a question that reaches well beyond software.
What happens to a field when verifying its claims becomes cheap enough to do constantly? How many compromises that we currently accept as unavoidable are really just artifacts of expensive verification? How many things we describe as limits of knowledge are actually limits of checking?
That's one of the problems we're working on at Avrea.
Hannu Varjoranta writes about trust, verification, and the systems that keep both intact. He builds Spegling and is a founding engineer at Avrea. Previously: The drift, the surrender, and the architecture they are asking for and Under the Shared Sky.