Survivorship Bias in the Age of AI: Now You Can Interview the Planes That Didn't Come Back

The short version

Survivorship bias, Abraham Wald's WWII insight that the bombers coming back with bullet holes were the survivors and that you should armor the parts of the plane without holes because that's where the fatal hits were, is the single most common error in product management. I wrote about it in 2024 with the manual playbook: exit surveys, churn interviews, post-mortems on dead features. Two years later, the bias has gotten worse in some ways (LLMs are trained on survivors; RAG retrieves survivors; every AI workflow optimizer re-fits to people who completed the workflow) and, for the first time, much more fixable in others (agents make it cheap to interview every churned user, scan for every non-event, and study every feature that flopped). The new PM job is to point a small fleet of agents at what's missing. The bombers that came back were interesting. In 2026, you can finally afford to ask the ones that didn't.

A quick recap, because the picture is worth more than my paragraph.

Diagram of a WW2 bomber silhouette with red dots clustered on the wings, fuselage, and tail. The classic Abraham Wald survivorship bias illustration showing where returning planes were hit. — Red dots show where returning bombers were hit. The instinct: armor those spots. Wald's correction: those are the survivable hits. Armor the rest. That's where the planes that didn't come back were shot. Image by McGeddon on Wikimedia Commons, CC BY-SA 4.0.

In WWII, the British Royal Air Force studied returning bombers, mapped the bullet holes, and almost reinforced the wings and tail. A statistician named Abraham Wald stopped them. The planes they were looking at were the ones that made it home. The planes hit in the engine, cockpit, and hydraulics never came back. The data was filtered by survival. The hits in the holes-free areas were the lethal ones.

In product management, the same filter runs every day. You hear from active users, not churned ones. You measure features that get clicked, not features people gave up trying to find. You analyze competitors who won, not the ones who failed identically. You celebrate A/B tests that shipped, not the ones that quietly lost. Every dataset in your dashboard has already passed through the gauntlet of your current product. The data is the bombers that came back.

I've written the long version of where this shows up in PM work. The short version is that your most engaged users, your most-clicked features, and your winning competitors are interesting, but they're not where the answers live. The answers are in the dead.

What changed in 2026

When I first wrote about this, the bottleneck was cost. You couldn't afford to interview every churned user. You couldn't run a post-mortem on every dead feature. You couldn't watch every drop-off in every flow. So we picked our battles, did a handful of churn calls a quarter, and called it good. Most teams skipped the work entirely.

Two things changed.

First, AI made the bias worse in three quiet ways. This is the part I think most teams have missed.

LLMs are trained on what got published. A research synthesis agent run on top of your customer interviews will quietly over-weight what was articulated and documented, and under-weight what was lost in the silence of "I just stopped logging in." If your synthesis prompt asks for "top themes," the top themes will reliably be themes from users who stayed long enough to give you the interview.

RAG and "ask your data" tools retrieve what got logged. Your events are logged because your active users triggered them. Your support tickets exist because someone cared enough to file them. The users who never figured out the product don't generate retrievable artifacts. An LLM with a vector store full of survivors will produce confident, articulate, eloquent summaries of the world according to the survivors.

Every workflow optimization tool re-fits to people who completed the workflow. AI funnel optimizers, AI conversion optimizers, AI personalization engines, all trained on the same filter. Their reward signal is "did the user finish?" The users who didn't finish are training-data noise to be filtered out. You're now reinforcing the wrong parts of the plane with vastly more horsepower than the RAF ever had.

The default behavior of every AI tool I've used in product work is to re-state what's already in the dataset, more eloquently. Without an explicit hunt for missing data, you get a faster, more confident wrong answer.

Second, AI dissolved the cost barrier on actually studying the dead. This is the part I'm excited about.

The bottleneck was never the analysis. PMs have always known they should talk to churned users. The bottleneck was that the work was expensive, awkward, and slow. An exit interview takes thirty minutes to schedule and forty-five to run. Multiplied by every churned account, you get an annual budget conversation, and the work doesn't happen.

Agents collapse that cost. Not to zero, but close enough that the budget conversation stops being the gate.

Where I'm pointing agents at the missing data

Here's what's running in my current setup. Each one is built around a single question: what is this filter hiding from me?

An exit-interview agent on every churn event. When a customer downgrades, cancels, or hits 30 days of no activity, the agent reaches out within the hour with a short, real-sounding ask. It runs the conversation, transcribes, classifies, and feeds the result into the same backlog the active-user research lives in. Volume is roughly 50x what a human-only team could run, and the signal is not what active users are saying.

A non-event watcher. Most analytics dashboards show what happened. I have an agent that scans for what didn't: features below a usage threshold, flows with abnormal drop-off, users who logged in but didn't do the thing the session was supposed to enable. The output is a weekly digest of negative space. (This is essentially the feature adoption tracking agent plus the red flag detection agent wired together.)

A failed-experiment post-mortem. Every A/B test that lost gets summarized into "what was the hypothesis, what actually happened, what does this tell us about the underlying assumption." Wins get the same treatment, but the losing-variant pile is where my team and I learn the most. Most teams stop reading the file when the test loses. The agent doesn't.

A churned-cohort segmentation pass. A customer segmentation agent that builds cohorts of users you almost never look at: signed up but never activated, activated but stopped within two weeks, evaluated and bounced. These groups are usually invisible in the standard segmentation because they don't have enough event volume to be statistically interesting. Agents are happy to dig through low-signal groups, because their time is cheap.

A competitor graveyard tracker. Public failures, sunset products, layoffs, pivots. Most competitive intel agents track who's winning. Mine also tracks who's losing, and why. The losers tell you which assumptions were wrong in the category, which is more useful than which assumptions were right for the one survivor you're now copying.

None of these are complicated. They're all variants of the same instruction: go find the data we don't have, and bring it back. The reason they didn't exist two years ago is that the cost-per-instance was too high. The reason they exist now is that the cost dropped by about two orders of magnitude.

What's still a PM job, even with agents

The judgment to aim the agents is still a human call. The default behavior of every AI tool is to study the survivors more thoroughly, because that's where the data is loudest. Pointing an agent at the silence, at the cohorts that don't trigger events, the features nobody clicks, the experiments that lost, takes a PM who has internalized Wald's lesson and is willing to spend the agent's time on negative space.

I think the next generation of senior PMs will be defined by this. Not by how good their roadmap looks, not by how clean their PRDs are, but by whether they pointed their agent fleet at the right unanswered questions. The bombers that came back are easy to study. The PMs who win will be the ones who built the agent fleet to study the ones that didn't.

Pick one thing to try this week

Pick the most painful missing-data question in your product. If you're not sure, default to: why are people churning in the first two weeks, and what do they say if I ask them? Wire one agent to handle the outreach, the conversation, and the synthesis. Don't aim for sophistication. Aim for getting fifty answers instead of two.

The bombers that came back were interesting. The ones that didn't have the answers. For the first time, you can afford to go look.

Sources: Abraham Wald, "A Method of Estimating Plane Vulnerability Based on Damage of Survivors" (1943). Bomber illustration by McGeddon on Wikimedia Commons, CC BY-SA 4.0.

Survivorship Bias in the Age of AI: Now You Can Interview the Planes That Didn't Come Back

The short version

What changed in 2026

Where I'm pointing agents at the missing data

What's still a PM job, even with agents

Pick one thing to try this week

Also on Medium

How to Avoid Survivorship Bias in Product Management

AI Agents and the Future of Work: A Pixar-Inspired Journey

Frequently asked

Falk Gottlob

Keep Reading

How to Avoid Survivorship Bias in Product Management

Build a Discovery Agent Stack: Continuous Customer Listening

The Agent Sandbox (PM Version): A Complete User Manual

Continuous Discovery Doesn't Scale for AI-Native Products

Audits, workshops, advisory.

Follow on LinkedIn.

Browse the toolkit.