Michael Lawrence's gitconnected piece on AI eroding developer skill lands a real anecdote: he refactored a method with an LLM, reviewed the diff, tests passed, and the change had quietly dropped a null check. He notes the 2020 version of himself would have caught it in twelve seconds. Twenty-five years of pattern recognition was rusting in months.
The diagnosis is right. The implicit fix is nostalgia. Nostalgia doesn't work.
This post argues the fix is the eval suite and the paired shipping session. The skill you cannot afford to lose has to live somewhere the model cannot cheat. That somewhere is the eval, not the developer's head.
The short version
Michael Lawrence's argument that AI is quietly degrading developer craft is correct in mechanism. When the LLM produces code that compiles and passes tests, the human reviewer's pattern recognition has nothing to push against and atrophies in months. The skill rusts fast for the specific patterns the LLM handles reliably. The piece stops short of a workable fix. Slowing down or hand-writing more code is nostalgia and economic pressure will overrule it. The actual fix is to externalize the rigor into the eval suite (a five-row template that catches the null-check class of regression) and to rebuild the pattern-recognition muscle through paired shipping (one driver, one rider, the rider's job is to catch what the model missed). Skill drift is real. The defense is structural.
For the broader argument about eval discipline, see The Five-Row Eval Template That Replaced My PRD and the handbook chapter The Eval Is the Spec. For paired shipping as a replacement for AI office hours, see Kill the AI Office Hours.
What Lawrence is right about
Three points worth crediting.
The mechanism of skill erosion is real and fast. When an LLM produces compiling, test-passing code, the part of the developer's brain that used to pattern-match for bugs has nothing to do. The pattern recognition is not generic. It is built from years of seeing the same class of mistake. When the input stops resembling those mistakes (because the LLM doesn't make them in the same shape), the recognition system gets weaker.
This is not "AI is making developers lazy." It is biology. Skills you don't use get weaker. The 2020 version of the developer was running thousands of these pattern matches per week. The 2026 version is running fewer. The muscle atrophies for the same reason any muscle atrophies.
The null-check anecdote is the right anecdote. Null safety is exactly the class of bug where the model's confident-and-wrong failure mode is most dangerous. The code compiles. The tests pass on happy paths. The regression only surfaces in production when a specific input shape hits the code path. By the time the bug shows up, you have shipped, the LLM has moved on, and the human reviewer who would have caught it five years ago is rusty.
The same pattern shows up in: off-by-one errors, race conditions, schema validation gaps, error-handling holes, security boundary mistakes, and any logic that depends on knowing the deep history of the codebase. The model is fastest in exactly the layer where it is least reliable.
Twenty-five years of experience can erode in months. This is the part that should scare senior engineers. The atrophy isn't proportional to your experience. It's faster for the patterns you used to handle on autopilot, because autopilot is the first thing to go.
Lawrence's diagnosis is sound. The piece is worth reading just for the framing.
Why the implicit fix won't work
Lawrence stops at the diagnosis, but the implicit conclusion most readers will draw is something like: slow down, hand-write more code, stop trusting the LLM, get back to first principles.
That fix will not hold. Three reasons.
Economic pressure. A team that hand-writes code in 2026 is moving 3x to 5x slower than the team that ships with agents. The slower team gets out-shipped. The skill-erosion problem doesn't go away. It just gets outsourced to a competitor.
The skill you're protecting is the wrong skill. The pattern recognition that catches a missed null check is valuable. It is also externalizable. You don't need to keep it in your head if you can externalize it into a test, an eval, or a linter. The senior engineers who outperform in 2026 have most of their pattern recognition externalized, not internalized. Their head is freed up for system design, not for the null-check pattern they could have written as a test.
Nostalgia is a status move, not a fix. "The 2020 version of me would have caught it" is true and emotionally satisfying. It does not produce a working code review for next Tuesday. The honest response to skill erosion is to externalize the skill so it survives the erosion, not to lament that the muscle is going.
The structural fix
Two parts. Both are well-tested.
Part one: the eval suite.
The skill you cannot afford to lose has to live somewhere outside your head. The null-check class of regression is exactly the kind of thing the five-row eval template (see The Five-Row Eval Template) catches.
Concrete example. The eval row for the null-check regression:
- Behavior: function returns a defined value for any input
- Input: null, undefined, empty string, zero, negative numbers
- Expected: function returns the documented default, not crashes
- Scorer: deterministic check (assertion library)
- Threshold: 100% on the null-input set
Build this row once. Run it on every diff. The eval is the externalized version of the 12-second pattern recognition Lawrence is losing. The pattern recognition is gone. The safety net is not.
Most of the skill drift Lawrence worries about can be converted into eval rows. The conversion takes hours, not days. The payoff is permanent.
Part two: paired shipping.
Some skills do not externalize cleanly. System design judgment. Knowing when to refactor vs. when to ship. Reading the actual intent behind a customer report. Those skills require live, deliberate practice.
Paired shipping rebuilds those skills. One driver, one rider. The driver writes the prompt and accepts the LLM's output. The rider's job is explicitly to catch what the driver missed. Ninety minutes. One real feature shipped.
The rider role is the externalization of the dual-process review that used to live inside one developer's head. The driver's role is the fast generation. The rider's role is the pattern recognition. Separating the roles is what protects the rider's pattern recognition from atrophy.
I outlined paired shipping as a replacement for AI office hours in the AI office hours post. It applies here too. The format is the same. The function is the same. The reason it works for skill drift is that the rider is doing the work the model can't, and is doing it deliberately.
What goes away vs. what to defend
Not all skill drift is bad. Some of it is the right thing happening.
Let the following rust. Boilerplate code generation. Framework scaffolding. Glue code. Test fixtures. Documentation comments. The first draft of any class file. These are the things the model handles reliably and that you don't need to keep in your head.
Defend the following. System design across multiple services. Eval design for AI-driven features. Security review at the model boundary. Debugging in production when the LLM's confidence is high and the behavior is wrong. Reading customer reports and translating to test cases. These are the patterns the model is worst at, and they are the patterns where the rust costs you the most.
The mistake is letting all your skills rust equally. The senior engineers who do best in 2026 are the ones who let the right things rust and put structural defense around the things they cannot afford to lose.
What to do this week
Pick three patterns you used to handle fast that you've been letting Claude or Cursor handle.
For each one, write an eval row. Behavior, input, expected, scorer, threshold. Maybe an hour total.
Wire the evals into your CI. If your team doesn't run evals in CI yet, run them locally and commit the script.
Then schedule one paired shipping session this week. Pick a junior dev as the rider. Ship something real. Notice what you (as the driver) miss and what they (as the rider) catch. That's the muscle you're rebuilding.
Lawrence is right that AI is eroding developer craft. The defense is not nostalgia. The defense is structural. Eval the bug class. Pair the judgment call. Let the boilerplate rust.
The five-row eval template, the paired shipping session format, and the senior-engineer skill-defense checklist are on the toolkit. For the broader operating model, see the handbook chapter The PM Agent Stack.
Sources: Michael Lawrence, "AI Isn't Replacing Developers. It's Doing Something Worse." (Level Up Coding / gitconnected), Hamel Husain on evals, Simon Willison on testing LLMs, Eugene Yan on the eval mindset, Claude Code.
Further reading
Frequently asked
What is Michael Lawrence's argument?+
That AI is not taking developers' jobs but is quietly degrading the skill that made them valuable. His personal anecdote: he 'refactored' a method using Claude, reviewed the diff, tests passed, and the change had quietly dropped a null check. He notes that the 2020 version of himself would have caught it in twelve seconds. Twenty-five years of pattern recognition was rusting in months.
Is the skill-erosion thesis correct?+
Mostly yes. The mechanism is real. When an LLM produces code that compiles and passes tests, the human reviewer's pattern recognition has nothing to push against and starts to atrophy. The atrophy is fast (months, not years) for the specific skills that the LLM does most reliably. Lawrence's diagnosis is correct.
Where does the piece stop short?+
The implicit conclusion is that the fix is to slow down, hand-write more code, or push back against AI tooling. That is nostalgia. It will not work because the economic pressure is too strong. The actual fix is to put the rigor somewhere the AI cannot cheat: the eval suite and the paired shipping session.
How does eval discipline prevent skill erosion?+
Because evals are testable claims about behavior. A null-check failure is exactly the kind of thing a five-row eval would catch (one row, scorer is deterministic, threshold is 100% on null inputs). The eval is the externalization of the pattern recognition the human is losing. Build the eval once, run it forever, sleep at night.
What is paired shipping and why does it help?+
Two developers (or a dev and a PM) ship one feature together in a 90-minute session. The driver writes the prompt, the rider catches the regression. The rider role exists specifically because human pattern recognition is still load-bearing for catching what models miss. Paired shipping rebuilds the muscle that solo prompting erodes.
Are there cases where the skill erosion is unavoidable and fine?+
Yes. Boilerplate code, framework scaffolding, test fixtures, internal tooling. Skill erosion on these is actually a feature, not a bug. The skill you don't need anymore can rust. The skill you need (system design, eval writing, debugging at the model boundary, security review) is the one to defend. The mistake is letting all skills erode equally.
What should a senior engineer do this week?+
Pick the three patterns you most relied on five years ago that you've been letting Claude or Cursor handle. Build an eval for each one. Run the eval on every diff. The eval externalizes the skill you're worried about losing. You still lose the speed, but you keep the safety net, and the safety net is what was actually valuable.

Comments (0)
Sign in with LinkedIn to leave a comment.
Sign in with LinkedIn