The Surprising Return of Manual QA

22 Jun, 2026

human-check

How the most disrespected role in software quietly became everyone's actual job

For about a decade we ran a quiet campaign to abolish the manual QA. Nobody announced it from the stage at the all-hands, but the intent was unmistakable. Shift left. Automate everything. Build the test pyramid. If a living human had to open the app and click a button to confirm it worked, that was treated as a defect in the process - a smell, evidence that someone upstream had failed to write enough Selenium tests. The manual tester was the first hire to be questioned and the first line item to be cut. In polite company we called the role "not very technical".

Then the machines learned to write the code, and in one of the better unscripted jokes of our industry, the manual QA strolled back in through the front door. Only this time it did not arrive as a person with a badge and a job title. It arrived as you. As me. As the principal engineer with fourteen years of scar tissue now leaning into a monitor, clicking the same form for the third time, whispering "no, that is genuinely not what I asked for".

We automated the typing. We forgot to automate the reading. As the old corny joke went - "Didn't you read the sign? - No! I am a writer, not a reader...". And it turns out the reading was the job all along.

The work did not vanish - it relocated

Here is the part nobody put on a slide - automation never deletes labor, it moves it somewhere less convenient.

When compilers got good, the rare skill stopped being the careful hand-assembly of registers and became system design. When the cloud made a server a one-line request, the rare skill stopped being racking hardware and became architecture, and then became the dark art of not getting a surprise invoice. Every wave of automation eats the middle - the production step, the typing, the grind - and squeezes the surviving value out toward the two ends. Upstream into specification, where you decide what should exist. Downstream into verification, where you decide whether the thing that now exists is any good.

Agentic AI is simply the most theatrical version of this we have ever seen. It eats the middle so fast and so cheerfully that the two ends slam together. You are now a thin slice of human judgment pressed between a spec you must defend and an output you must inspect, while a tireless and supremely confident intern does the part in the middle that used to be your entire career.

My four days as a hostile witness

Last month I built a line-of-business app. Not an algorithmically clever thing - nobody is publishing a paper about it. It carried what I would call manual complexity, the kind that does not come from hard math but from real people doing real and deeply fiddly things in a real business. Statuses that mean different things on Tuesdays. Approvals that loop. Edge cases that are technically "the whole point of the department." That sort of thing.

It took four days not even doing only that, spec-furst, including build automation, integrations and some help from a friend on a weitd po-deployment issue - turns out Docker is Docker everywhere but Ubuntu is not Alpine :), but that is a different story, maybe warranting anither post. And here is the confession - almost none of those four days were spent watching code being written.

What did I actually do? I answered questions. Endless, pointed, slightly exhausting questions about the spec. Should a cancelled record be recoverable, and for how long, and by whom, and does the audit trail care. Then I got grilled - and I mean grilled, by an automated grill-me skill whose entire purpose is to interrogate me before a single feature is built. It cross-examined my features, my workflows, my architecture, my choice of infrastructure. It found the soft spots in my thinking with the unbothered persistence of a tax auditor. I experienced a machine whose job was to make me uncomfortable, so that a different machine could build the right thing instead of the thing I had vaguely gestured at.

And when the code did arrive, I verified. I clicked. I read diffs. I poked the exact edges the grill-me skill had warned me about. The generation itself was close to free. The wall-clock cost of those four days was specification and verification - deciding what, and then confirming whether.

The labor did not disappear. It just stopped looking like engineering and started looking like testimony.

Choose your stack for the loop you can no longer escape

Here is the practical part, and it is the one most teams are getting wrong because old instincts die slowly.

For years we selected technology to optimize how fast a human could write and maintain it. Reasonable, when writing was the bottleneck. It is not the bottleneck anymore. The bottleneck is now the verification loop - the time between the agent making a change and a human being able to look at the result and say yes or no. Every second of that loop is a second the agent is not moving forward, or worse, a second in which it has confidently produced three more changes you have not checked yet.

So select for visibility. The new rule of thumb is brutally simple - if you cannot see and use the result of an agentic change almost immediately, you have chosen the wrong tech.

In practice that pushes you toward a few opinionated preferences.

Prefer stacks with instant feedback - hot reload, live preview, a running app you watch update as the agent edits it. The result should appear in front of you in seconds, not after a build-and-deploy ritual.
Treat the compiler and the type system as free, tireless verifiers. A strict type checker catches a whole genre of confident AI nonsense before you ever have to look at it. Let the machine grill the machine.
Favor visible, declarative state over clever hidden state. If you can see what the system believes is true, you can verify it. If the truth is buried in a cache three services deep, you cannot, and neither can you trust it.
Keep the path from change to running result short and local. A ten minute CI pipeline (on a smaller system) as your only feedback mechanism is now an active liability, because the agents have lapped you four times by the time it goes red.
Demand human-readable outputs and diffs. Verification is reading. Optimize for readability the way we used to optimize for runtime.

None of this is exotic. It is mostly the old advice about tight feedback loops, except the stakes have inverted. The feedback loop used to be a nicety that made developers a little happier. Now it is the rate-limiting reaction of the entire process, and your architecture should be designed around protecting it.

The punchline

We spent ten years trying to make the manual QA obsolete, and we succeeded so completely that the role escaped the org chart and infected everyone. The skill that now separates good engineers from expensive ones is not the ability to produce - the intern in the box produces faster than any of us. It is the ability to specify without flinching and to verify without trusting. To know exactly what you want, and to know, with cold and slightly paranoid precision, when the confident machine has handed you something subtly wrong.

Trust but verify was always a comforting phrase. In the agentic era we have quietly dropped the first half. It is just verify now. Trust is an option of the harness.

The boring news is that taste, skepticism, and the patience to read carefully are suddenly the most valuable things you own. The good news is roughly the same sentence. The manual QA is back. Try to be a good one.