DiscoveryNew·Falk Gottlob··updated ·9 min read

Continuous Discovery Doesn't Scale for AI-Native Products

Teresa Torres' continuous discovery is the right answer for human-centric SaaS and the wrong answer for agent products. The punctuated discovery alternative.

discoverycontinuous discoveryAI nativeTeresa Torrescounter-canonevals
Helpful?

PER-SEAT ARROUTCOME ARRGM TROUGHFALKSTER · CPO EDITIONISSUE 01 · 2026FIG. 00 · COMPOSITE OF THESES 01–03

I have a respectful argument with Teresa Torres. I want to lay it out clearly because I think it matters for product teams shipping AI-native products, and because Teresa's frame deserves engagement, not dismissal.

Teresa's Continuous Discovery Habits shaped how a generation of PMs think about learning from customers. The core argument is simple and powerful: discovery is not a phase, it's a habit. Commit to one to two hours per week of customer interviews, forever. Build an Opportunity Solution Tree. Make discovery part of the rhythm. Don't let it become a project that ends.

This is right for the products her frame was built for. It is wrong for AI-native products. And the field hasn't done this update yet.

The short version

Continuous discovery assumes a deterministic feedback loop: customer expresses pain → team builds feature → adoption signals validate or falsify. Agent products break that loop in three places. The agent generates behaviors users don't anticipate. Causality is fuzzy in non-deterministic systems. The customer interview captures a fraction of the product surface (the rest is agent quality, latency, drift, escalation handling).

The right model for AI-native products is punctuated discovery: three-week intense sprints sandwiched between eight-week build phases. The sprints include customer interviews, but at a 40% allocation. The other 60% is agent testing: running the agent against real workloads, watching where it drifts, building evals, observing edge behaviors.

This is a respectful argument with the canon. Both Teresa's model and this one are correct, for different products. Most teams shipping agent products are still running the human-centric model and missing 60% of what they're building.

The three places the deterministic loop breaks

Break 1: Agent behavior is not customer-anticipated

In a deterministic SaaS product, the customer can describe their pain, and that description maps to features that would address the pain. "I want a faster way to filter invoices by date." Clear pain. Clear solution. Build the filter.

In an agent product, the agent generates behaviors the customer didn't anticipate. The customer doesn't say "I want the agent to better handle the case where my invoice number includes a slash." The customer says "the agent broke." When you investigate, the bug is in a corner the customer never thought to mention because they didn't know it was a corner.

Customer interviews still tell you about the product's job-to-be-done. They tell you almost nothing about the product's actual surface, because the surface is generated dynamically. You have to discover the surface yourself, by running the agent against real workloads and watching where it drifts.

Break 2: Causality is fuzzy

The OST works because in a deterministic system, you can trace from outcome (customer adopted/retained/expanded) back to the opportunity that drove the outcome back to the solution that addressed the opportunity. Causality flows.

In an agent product, the same input gives different outputs across runs. The customer's experience this week differs from their experience next week even on the same workflow. Adoption signals are noisier. Retention causes are more diffuse. The OST collapses because the tree assumes causality at branch points where causality is fuzzy.

You can still build something OST-shaped for an agent product, but it's a different tree. The branches are agent behaviors, not customer opportunities. The leaves are evals, not features. The root is outcome quality, not customer pain.

Break 3: The cadence is wrong

Continuous discovery's one-to-two hours per week is tuned for the speed at which a SaaS product changes. Releases ship every six weeks. Customer needs evolve over months. Weekly interviews keep up.

Agent products change daily. The model drifts. Prompts evolve. New capabilities show up because the underlying foundation model improved. Customer expectations shift weekly because the customer is using ten other AI products that are all moving. The same one-to-two hours per week of customer interviews is too slow.

The right cadence is bursty: intense discovery during build cycles, less frequent during stabilization, intense again when something material shifts. Continuous discovery's even cadence smooths over moments when discovery should be loud.

What I run instead

Punctuated discovery. Three rules.

Rule 1: Discovery is bounded and intense, not continuous

Each cycle has two phases. Eight weeks of focused building. Three weeks of focused discovery. Repeat.

During the build phase, the team is heads-down on the work that came out of the last discovery sprint. There's no background discovery activity. The team isn't doing one or two customer interviews per week as a habit. They're shipping.

During the discovery sprint, the team is heads-down on learning. No feature work. No PRDs. Three weeks of pure investigation, ending in a commit document.

This is louder than continuous discovery and less frequent. It feels riskier (what if we miss a customer signal during the build phase?). In practice, you don't miss the signals because the discovery sprint is intense enough to catch them in retrospect, and customer support / signal-listening systems can flag emergencies during the build phase.

The advantage is focus. Three weeks of full attention on discovery produces ten times the insight of three months of one-hour-per-week sessions, because most of those one-hour sessions are interrupted by build work and never get the depth of synthesis.

Rule 2: 60% of discovery is agent testing, not customer interviews

In a discovery sprint for an agent product, allocate roughly:

  • 40% to customer interviews. Jobs-to-be-done conversations, willingness-to-pay tests, dispute pattern analysis, observation sessions.
  • 40% to agent testing. Running the agent against real workloads. Building evals. Observing edge behaviors. Stress-testing with adversarial inputs. Comparing performance across model versions.
  • 20% to competitive and ecosystem surface. What did competitors release. What did the underlying foundation models add. What did customers' other AI tools do that changed expectations.

Most teams I observe spend 90% on customer interviews and 10% on agent testing, because customer interviews are familiar and agent testing is new. They miss most of the actual product surface.

The 60/40 inversion (agent + ecosystem at 60%, customer at 40%) is not a downgrade of customer voice. It's a recognition that the customer is now one of three signals that matter for product quality, not the only one.

Rule 3: Discovery ends in a commit document

Every discovery sprint ends with a written commit document. Three sections.

Section 1: what we learned. Five to ten findings, each one-paragraph long, sourced. Includes both customer findings and agent findings.

Section 2: what we'll test. Three to five hypotheses, each with the eval or experiment that would validate or falsify it.

Section 3: what we'll ship. Two or three commitments for the next eight-week build cycle, with success criteria.

The document is the artifact that makes the next eight weeks of building disciplined. Without it, the discovery sprint becomes "interesting conversations" that don't translate into shipped product. With it, the build phase has a clear contract.

I borrow this from how research-heavy engineering teams operate (the eight-week sprint with a written postmortem). It's not novel. It's just an honest application of focused work cycles to the discovery problem.

What this changes about the team

Continuous discovery puts the PM at the center. The PM does the interviews. The PM builds the OST. The PM owns discovery as a habit.

Punctuated discovery decentralizes. Customer interviews are one strand. Agent testing is another strand, owned by an agent specialist or a senior PM/engineer. Competitive intelligence is a third strand, owned by product marketing or a dedicated agent (see agent-competitive-intel). The PM's job is to integrate the three strands into the commit document, not to own all the work.

This requires a different team shape. You need an agent quality function (whether a person or a documented practice). You need eval infrastructure. You need someone with the time and discipline to test the agent against real workloads, not just review customer transcripts.

Most teams trying to ship agent products don't have this shape. They're staffed for traditional SaaS discovery and trying to learn agent behavior on the side. The result is the OST has fewer branches than the actual product surface, and the team ships into the dark.

The respectful argument with Teresa

Teresa Torres' continuous discovery is one of the most useful product frameworks of the last decade. It rescued a generation of PMs from the over-spec'd, over-prioritized, under-tested mess of pre-2018 product management. It made customer voice the heartbeat of the product team. I've used it. I've taught it. I owe Teresa the credit.

The argument here is not that continuous discovery is wrong. It's that continuous discovery was designed for human-centric SaaS, and AI-native products are a different category. Continuous discovery for SaaS, punctuated discovery for agents. Both correct. Different products.

I would expect Teresa, given her own intellectual honesty, to be the first to update the model when AI-native products demand the update. The 2026 conversation in product management hasn't yet had this argument out loud. I'm offering it as a contribution to that conversation.

If I'm wrong, the answer should be specific: which of the three breaks (anticipation, causality, cadence) is incorrect, and why? If I'm right, the field needs to update its discovery curriculum, and the people teaching continuous discovery to AI-product PMs need to teach the punctuated variant alongside it.

What to try this week

Run a thought experiment on your last quarter. Write down:

  • How many hours did your team spend on customer interviews?
  • How many hours did your team spend on agent testing (running the agent against real inputs, building evals, observing edge cases)?
  • What's the ratio?

If the ratio is 90/10 or higher (most customer, little agent), and you're shipping an agent product, you have a discovery model mismatch. You're using the right frame for the wrong product.

The fix is not to do more interviews. The fix is to allocate engineering time to agent testing during the next discovery sprint, with the same seriousness you'd allocate to customer interviews. Three weeks. 40% allocation. Written commit document at the end.

Try it for one cycle. The team's mental model of the product surface will change visibly.

That change is what makes punctuated discovery work for agent products in a way continuous discovery, by structure, can't.


The Punctuated Discovery Sprint Template is at /toolkit/punctuated-discovery-sprint-template. The companion handbook chapter on direction metrics that pairs with this discovery model is at /handbook/direction-metrics. Continuous discovery for traditional SaaS still applies and is covered in /handbook/continuous-discovery-autopilot.

Further reading

Share this post

Also on Medium

Full archive →

Frequently asked

What is continuous discovery?+

Teresa Torres' framework: PMs commit to 1-2 hours of customer interviews per week, every week, forever. Discovery is a habit, not a phase. The output is the Opportunity Solution Tree, which maps customer pains to opportunities to solutions. It is the dominant discovery model in product management since 2018.

Why doesn't it scale for AI-native products?+

Three reasons. (1) Agent products generate novel behaviors users don't anticipate, so 'customer interviews' miss most of the actual product surface. (2) Causality is fuzzy in non-deterministic systems, so the OST collapses. (3) The right discovery cadence for agent products is much faster than weekly customer interviews can support.

What is punctuated discovery?+

Three-week intense discovery sprints sandwiched between 8-week build phases. Each sprint includes customer interviews, agent testing, eval building, and competitive surface analysis. The cadence is louder, less frequent, and more impactful than continuous discovery's background hum.

What is the 60/40 ratio?+

60% agent testing (running the agent against real workloads, watching where it drifts, building evals, observing edge behavior), 40% customer interviews (still important for jobs-to-be-done, willingness to pay, dispute patterns). For traditional SaaS, the ratio inverts. The frame matters because most teams allocate 90% to interviews and miss the agent surface entirely.

Is this an attack on Teresa Torres?+

No. Continuous discovery is the right model for human-centric SaaS, and Teresa's frame held up across thousands of teams. The argument here is that AI-native products are a different category that requires a different model. Both can be true. Most of the canon hasn't done this update yet.

What are the three rules of punctuated discovery?+

(1) Discovery is intense and bounded, not continuous. Three weeks of focused work. (2) Discovery includes the agent itself as a subject, not just the customer. Eval-building is discovery work. (3) Discovery ends with a written commit document: what we learned, what we'll test, what we'll ship. The doc is the artifact that makes the next eight weeks of building disciplined.

Keep Reading

Posts you might find interesting based on what you just read.