DiscoveryNew·Falk Gottlob··8 min read

Survivorship Bias in the Age of AI: Now You Can Interview the Planes That Didn't Come Back

Wald's WWII bomber lesson still runs product management. AI both worsens the bias and, for the first time, makes the missing data affordable to study. Here's how I'm using agents to fix it in 2026.

survivorship biasAI agentsdiscoverychurndecision makingproduct management
Helpful?

The short version

Survivorship bias, Abraham Wald's WWII insight that the bombers coming back with bullet holes were the survivors and that you should armor the parts of the plane without holes because that's where the fatal hits were, is the single most common error in product management. I wrote about it in 2024 with the manual playbook: exit surveys, churn interviews, post-mortems on dead features. Two years later, the bias has gotten worse in some ways (LLMs are trained on survivors; RAG retrieves survivors; every AI workflow optimizer re-fits to people who completed the workflow) and, for the first time, much more fixable in others (agents make it cheap to interview every churned user, scan for every non-event, and study every feature that flopped). The new PM job is to point a small fleet of agents at what's missing. The bombers that came back were interesting. In 2026, you can finally afford to ask the ones that didn't.

A quick recap, because the picture is worth more than my paragraph.

Diagram of a WW2 bomber silhouette with red dots clustered on the wings, fuselage, and tail. The classic Abraham Wald survivorship bias illustration showing where returning planes were hit.

Red dots show where returning bombers were hit. The instinct: armor those spots. Wald's correction: those are the survivable hits. Armor the rest. That's where the planes that didn't come back were shot. Image by McGeddon on Wikimedia Commons, CC BY-SA 4.0.

In WWII, the British Royal Air Force studied returning bombers, mapped the bullet holes, and almost reinforced the wings and tail. A statistician named Abraham Wald stopped them. The planes they were looking at were the ones that made it home. The planes hit in the engine, cockpit, and hydraulics never came back. The data was filtered by survival. The hits in the holes-free areas were the lethal ones.

In product management, the same filter runs every day. You hear from active users, not churned ones. You measure features that get clicked, not features people gave up trying to find. You analyze competitors who won, not the ones who failed identically. You celebrate A/B tests that shipped, not the ones that quietly lost. Every dataset in your dashboard has already passed through the gauntlet of your current product. The data is the bombers that came back.

I've written the long version of where this shows up in PM work. The short version is that your most engaged users, your most-clicked features, and your winning competitors are interesting, but they're not where the answers live. The answers are in the dead.

What changed in 2026

When I first wrote about this, the bottleneck was cost. You couldn't afford to interview every churned user. You couldn't run a post-mortem on every dead feature. You couldn't watch every drop-off in every flow. So we picked our battles, did a handful of churn calls a quarter, and called it good. Most teams skipped the work entirely.

Two things changed.

First, AI made the bias worse in three quiet ways. This is the part I think most teams have missed.

LLMs are trained on what got published. A research synthesis agent run on top of your customer interviews will quietly over-weight what was articulated and documented, and under-weight what was lost in the silence of "I just stopped logging in." If your synthesis prompt asks for "top themes," the top themes will reliably be themes from users who stayed long enough to give you the interview.

RAG and "ask your data" tools retrieve what got logged. Your events are logged because your active users triggered them. Your support tickets exist because someone cared enough to file them. The users who never figured out the product don't generate retrievable artifacts. An LLM with a vector store full of survivors will produce confident, articulate, eloquent summaries of the world according to the survivors.

Every workflow optimization tool re-fits to people who completed the workflow. AI funnel optimizers, AI conversion optimizers, AI personalization engines, all trained on the same filter. Their reward signal is "did the user finish?" The users who didn't finish are training-data noise to be filtered out. You're now reinforcing the wrong parts of the plane with vastly more horsepower than the RAF ever had.

The default behavior of every AI tool I've used in product work is to re-state what's already in the dataset, more eloquently. Without an explicit hunt for missing data, you get a faster, more confident wrong answer.

Second, AI dissolved the cost barrier on actually studying the dead. This is the part I'm excited about.

The bottleneck was never the analysis. PMs have always known they should talk to churned users. The bottleneck was that the work was expensive, awkward, and slow. An exit interview takes thirty minutes to schedule and forty-five to run. Multiplied by every churned account, you get an annual budget conversation, and the work doesn't happen.

Agents collapse that cost. Not to zero, but close enough that the budget conversation stops being the gate.

Where I'm pointing agents at the missing data

Here's what's running in my current setup. Each one is built around a single question: what is this filter hiding from me?

An exit-interview agent on every churn event. When a customer downgrades, cancels, or hits 30 days of no activity, the agent reaches out within the hour with a short, real-sounding ask. It runs the conversation, transcribes, classifies, and feeds the result into the same backlog the active-user research lives in. Volume is roughly 50x what a human-only team could run, and the signal is not what active users are saying.

A non-event watcher. Most analytics dashboards show what happened. I have an agent that scans for what didn't: features below a usage threshold, flows with abnormal drop-off, users who logged in but didn't do the thing the session was supposed to enable. The output is a weekly digest of negative space. (This is essentially the feature adoption tracking agent plus the red flag detection agent wired together.)

A failed-experiment post-mortem. Every A/B test that lost gets summarized into "what was the hypothesis, what actually happened, what does this tell us about the underlying assumption." Wins get the same treatment, but the losing-variant pile is where my team and I learn the most. Most teams stop reading the file when the test loses. The agent doesn't.

A churned-cohort segmentation pass. A customer segmentation agent that builds cohorts of users you almost never look at: signed up but never activated, activated but stopped within two weeks, evaluated and bounced. These groups are usually invisible in the standard segmentation because they don't have enough event volume to be statistically interesting. Agents are happy to dig through low-signal groups, because their time is cheap.

A competitor graveyard tracker. Public failures, sunset products, layoffs, pivots. Most competitive intel agents track who's winning. Mine also tracks who's losing, and why. The losers tell you which assumptions were wrong in the category, which is more useful than which assumptions were right for the one survivor you're now copying.

None of these are complicated. They're all variants of the same instruction: go find the data we don't have, and bring it back. The reason they didn't exist two years ago is that the cost-per-instance was too high. The reason they exist now is that the cost dropped by about two orders of magnitude.

What's still a PM job, even with agents

The judgment to aim the agents is still a human call. The default behavior of every AI tool is to study the survivors more thoroughly, because that's where the data is loudest. Pointing an agent at the silence, at the cohorts that don't trigger events, the features nobody clicks, the experiments that lost, takes a PM who has internalized Wald's lesson and is willing to spend the agent's time on negative space.

I think the next generation of senior PMs will be defined by this. Not by how good their roadmap looks, not by how clean their PRDs are, but by whether they pointed their agent fleet at the right unanswered questions. The bombers that came back are easy to study. The PMs who win will be the ones who built the agent fleet to study the ones that didn't.

Pick one thing to try this week

Pick the most painful missing-data question in your product. If you're not sure, default to: why are people churning in the first two weeks, and what do they say if I ask them? Wire one agent to handle the outreach, the conversation, and the synthesis. Don't aim for sophistication. Aim for getting fifty answers instead of two.

The bombers that came back were interesting. The ones that didn't have the answers. For the first time, you can afford to go look.


Related reading: the original Wald-bomber framing, the new operating model that assumes an agent fleet, your AI agent fleet.

Sources: Abraham Wald, "A Method of Estimating Plane Vulnerability Based on Damage of Survivors" (1943). Bomber illustration by McGeddon on Wikimedia Commons, CC BY-SA 4.0.

Share this post

Also on Medium

Full archive →

Frequently asked

What is survivorship bias in product management?+

Drawing conclusions from the users, features, tests, and competitors that survived your current filters, while ignoring the ones that didn't. Your most engaged users, your most-clicked features, and your winning competitors are 'the bombers that came back.' The PMs who only listen to them keep reinforcing the wrong parts of the plane.

How does AI make survivorship bias worse?+

Three ways. (1) LLMs are trained on content that got published and indexed, so a research synthesis agent will quietly over-weight what was documented and under-weight what was lost. (2) RAG and 'ask your data' tools retrieve what's logged, which is overwhelmingly the survivors. (3) Every workflow optimization tool re-fits to people who completed the workflow. Without an explicit hunt for the missing data, AI just gives you a faster, more confident wrong answer.

How can AI agents actually fix survivorship bias?+

Agents make it cheap to study the dead. A churn-interview agent can do exit conversations at scale. A red flag detection agent can scan for non-events. A feature adoption agent can flag what's not getting used, not just what is. A customer segmentation agent can build a cohort of users you have almost no data on. The bottleneck was never the analysis. It was the cost of going to look. Agents collapse that cost.

What's the first thing a PM should do this week?+

Pick one agent and aim it at a missing-data question. Easiest start: a feature adoption agent that emails you a weekly list of features below a usage threshold, ranked by who's churning around them. You'll find at least one feature that's hurting you that you didn't know about.

Does AI ever reduce survivorship bias by itself?+

Only when you point it at the right question. The default behavior of every AI tool I've used is to re-state what's already in the dataset, more eloquently. The judgment of 'go find the data that isn't here' is still a PM job. The agent just makes the going cheap.

About the author

Falk Gottlob

Falk Gottlob

Product Executive · Founder, Falkster.AI

Thirty years shipping product at Microsoft Research, Adobe, Salesforce (Marketing Cloud / Quip / Slack), and several startups including one $6.5B exit and one acquired by Microsoft. Now CPO at Smartcat and founder of Falkster.AI, writing this notebook from the boardroom, not the keyboard.

Keep Reading

Posts you might find interesting based on what you just read.