16 Senior Devs Used AI to Code. They Thought It Made Them 20% Faster. It Made Them 19% Slower.
Let me start with the number that gave me chills.
METR ran a randomized controlled trial with 16 senior open-source developers, people with many years behind them, doing real tasks on projects they’d maintained for an average of five years. Half used AI tools, half didn’t. The group using AI was 19% slower. A little slower wouldn’t be surprising. The real problem is the other half: these people predicted AI would make them 24% faster beforehand, and after they’d finished, after they’d personally lived through being slower, they still believed they’d gone 20% faster. Their gut and the stopwatch were off by nearly 40 percentage points, and the sign was flipped.
I kept coming back to it afterward: why do people get this so wrong, and get it wrong on the work they know best?
My own experience writing things with AI explains most of it. You type one sentence and a screenful of code appears. That instant is genuinely satisfying. Your fingers barely moved, and the thought in your head is “that fast already.” But that’s just the opening of the whole thing. Next you have to read what it wrote, judge whether it’s right, run it, and then discover it has written some plausible-but-wrong logic in a particularly tidy, particularly correct-looking way, and you spend another twenty minutes digging out the thing that “looks right but isn’t.” That first hit of satisfaction gets logged as “fast.” Those twenty minutes of wrestling afterward don’t get counted as “writing code” — they get counted as “debugging,” or “I’m just off today.” What AI saves is the physical effort of typing. What it adds is the mental effort of verifying. And people are acutely sensitive to saving physical effort and numb to spending extra mental effort. That’s exactly where the gut and the stopwatch part ways.
There’s also a premise that’s easy to skip past: these 16 people were working in code they’d been steeping in for five years. That’s precisely the situation where AI helps least, and is most likely to actively get in the way, because you already understand the system better than any model does. Half its suggestions are just re-guessing things you’d long since worked out, and you still have to spend time confirming it didn’t guess wrong. Change the setting and the conclusion might flip: send me into a totally unfamiliar framework, have me write a pile of boilerplate, or spin up a small tool from scratch, and AI is probably genuinely faster. So this study isn’t saying “AI is useless.” It’s saying AI’s speed is extremely situation-dependent, and your gut can’t tell which situation you’re in.
Here’s why I, doing product, care about this one in particular. Almost every AI-related decision in our line of work right now rests on the same sentence underneath: it makes us faster. Whether to add budget for tools, whether to hire two fewer people, whether we can cram one more feature into the quarter, how to answer when the boss asks “how much did AI speed us up” — all of it leans on that sentence. The whole 2026 wave of AI layoffs is sold with the same productivity story. But what this study says is: even the people doing the work with their own hands can’t accurately judge whether they got faster. So the budgets, the roadmaps, the layoffs built on that judgment are sitting on loose ground. What makes it worse is that verifying it is genuinely hard, because the first method I’d reach for is to go ask the team “did AI help,” and that’s exactly the data source I shouldn’t trust.
So over the past six months I’ve done one fairly concrete thing: I struck “felt way faster” from the list of evidence. When anyone says it now, myself included, I follow up first: where can you see it. Did this iteration take a few days less than the last one. Are there more production bugs or fewer. Did rework go up. That chunk AI wrote — how many times did we go back and change it. If there are numbers, I believe it. If there aren’t, I treat it as a gut feeling and hold it in doubt. I also stopped asking the vague “is AI useful” and switched to “on which piece of the work is it useful.” Autocomplete, looking up an unfamiliar API, starting a new project — probably yes. Touching the old system of ours that’s been running for years — I assume by default it’ll slow us down, unless someone can produce a counterexample that changes my mind.
Discussion