The verification tax

Every organisation deploying AI is discovering the same uncomfortable truth: generating output is now cheap and fast, but verifying that output remains expensive, slow, and stubbornly human. The gap between creation speed and verification speed is where most of the promised value of AI quietly disappears. Call it the verification tax.

The economics are counterintuitive. AI doesn't eliminate work. It moves it downstream. What used to be a creation bottleneck becomes a verification bottleneck, and verification doesn't scale with compute.

The 40-point perception gap

METR ran a randomised controlled trial with 16 experienced open-source developers completing 246 tasks, randomly assigned to work with or without AI tools (Cursor Pro with Claude 3.5/3.7 Sonnet). AI made developers 19% slower. But those same developers estimated they were 20% faster. That is a nearly 40-percentage-point gap between perception and reality.

Not an isolated finding. Faros AI analysed telemetry from over 10,000 developers across 1,255 teams. High AI adoption correlated with 21% more tasks completed and roughly twice as many PRs merged. Sounds good until you look at the other side of the ledger: PR review time increased by 91%. Bugs per developer rose 9%. Average PR size grew by 154%. And no significant correlation existed between AI adoption and company-level improvement. The individual productivity gains were absorbed entirely by downstream bottlenecks.

The most telling statistic: AI-generated pull requests wait 4.6 times longer for human review than human-written code. The bottleneck in software development has shifted from writing code to reading it.

Cognitive surrender

The perception gap is not a measurement quirk. It reflects something deeper about how humans interact with AI output.

Wharton researchers Gideon Nave and Steven Shaw ran three preregistered studies with 1,372 participants across 9,593 trials. People followed AI output on more than half of all questions. They accepted correct AI answers 93% of the time, which is fine. But they also followed wrong AI answers 80% of the time. Access to AI boosted their confidence by nearly 12 percentage points even when their answers were incorrect.

Nave and Shaw call this cognitive surrender, framing it as a "System 3" that overrides both intuition (System 1) and deliberation (System 2). The mind outsources judgement to the machine, then feels more confident about the result. It is not laziness. It is a genuine cognitive shift, and it operates below conscious awareness.

The consequences are already materialising in professions where verification should be second nature. A database tracking AI hallucination court cases has identified over 206 instances of lawyers submitting AI-generated filings with fabricated case citations. Fines range from $1,000 to $5,000 per incident, with professional sanctions on top. These are trained adversarial reasoners in a system designed for scrutiny, and they still trusted output they should have checked.

The economics of inattention

The verification tax compounds over time, and the mechanism is perverse. Separate Wharton research by Hamsa Bastani and Gérard Cachon shows that as AI becomes more reliable, the economic incentives for human oversight actually deteriorate. When AI errors are rare, humans must spend effort reviewing output that is almost always correct. Sustained attention becomes economically irrational. The better the AI gets, the less rational it is to check its work, and the more catastrophic the uncaught errors become when they arrive.

A rough rule has emerged across industries: for every 10 hours of productivity gained through AI, organisations pay back approximately 4 hours correcting low-quality output. That is a 40% effective tax rate on AI-generated productivity. And licensing fees represent only 20–30% of total AI deployment cost. The other 70–80% hides in data preparation, integration, testing, change management, and verification overhead.

Amdahl's Law for organisations

There is a clean mathematical framing for why this keeps happening. Amdahl's Law states that the maximum speedup of a system is limited by the portion that cannot be improved. If AI makes generation 10x faster but verification stays at human speed, and verification represents roughly 40% of the total workflow, the maximum theoretical speedup is 1.7x. Not 10x. Not 5x. 1.7x.

Most businesses deploying AI are optimising the wrong part of the process. They accelerate creation, which was never the binding constraint, while leaving verification and approval untouched. The organisational equivalent: a factory where the assembly line runs at machine speed but quality control moves at walking pace. The output piles up. The value doesn't ship.

The lean manufacturing corollary applies directly. Work in progress without throughput is waste. Scaling AI without solving verification is building inventory, not shipping product.

Who captures the value

The companies that will extract real returns from AI are not those generating the most output. They are the ones building the fastest, most reliable verification layer.

Three patterns separate the winners:

Automated evaluation layers. LLM-as-judge architectures, code-based quality gates, and multi-agent review systems that catch errors at machine speed rather than human speed. The verification loop runs alongside generation, not after it.

Embedded verification. Restructured workflows where checking is woven into the creation process rather than appended as a separate step. The old model of create-then-review breaks when creation runs 10x faster. Verification has to become a property of the pipeline, not a phase in the waterfall.

Deliberate triage. Not all AI output needs the same level of scrutiny. Bastani and Cachon's research contains the key insight: AI works best when it is "predictably strong at some tasks and predictably weak at others." When humans can anticipate where AI is likely to fail, oversight becomes targeted rather than diffuse. The verification tax drops because you stop checking everything and start checking the right things.

The moat nobody is talking about

The generation layer is commoditising fast. Every vendor has access to the same foundation models. The competitive advantage in the AI era belongs to whoever builds the best operational plumbing, not whoever has the flashiest generation capabilities.

Bastani and Cachon's finding deserves more attention than it gets: the value of AI is highest when its failure modes are predictable. An AI that is brilliant 95% of the time and wrong in unpredictable ways is less valuable than one that is solid 90% of the time with well-understood weaknesses. Predictability lets you build verification systems around the failure modes. Unpredictability forces expensive, diffuse, human-speed oversight on everything.

The perception gap makes all of this harder to fix. Developers think they are faster. Organisations report productivity gains. Confidence rises. But the bugs are up, the review queues are longer, and company-level metrics have not moved. The verification tax is invisible precisely because cognitive surrender prevents the people paying it from recognising it.

I think the organisations that solve verification first will do more than save costs. They will change the shape of what AI can do. Right now, most AI applications are constrained to domains where errors are cheap: drafting, summarising, brainstorming. The first organisations to build verification infrastructure that catches errors at generation speed will be the first to deploy AI where errors are expensive: medicine, law, finance, infrastructure. That is where the real value sits, locked behind a verification problem that no amount of prompt engineering will solve.

The 40-point perception gap

Cognitive surrender

The economics of inattention

Amdahl's Law for organisations

Who captures the value

The moat nobody is talking about

Stay up to date

More articles

The competence penalty

Borrowed competence

AI never flinches