DiscoveryNew·Falk Gottlob··updated ·13 min read

Customer Discovery When Your Customer Is an Agent

Teresa Torres' continuous discovery playbook assumes your customer is human. When the customer is an agent, six new methods replace the weekly interview: agent telemetry, failure-mode interviews, eval drift, and more.

customer discoveryAI agentsTeresa Torrescontinuous discoveryagent telemetryMCPAI-native PM
Helpful?

Teresa Torres wrote the book on continuous discovery, and I'm a fan. Interview a user every week. Build an opportunity solution tree. Test assumptions. The canonical playbook is solid, and most product teams under-use it, not over-use it.

The playbook has a hidden assumption, though. It assumes your customer is a human who can be interviewed.

In 2026, a growing share of who actually interacts with your product is not a human. It's an agent. A Claude instance, a GPT-powered workflow, a custom-built orchestrator built by your customer's engineering team. The agent reads your API, uses your tools, consumes your content, and interacts with your endpoints. It doesn't have opinions. It doesn't get frustrated. It doesn't answer interview questions.

And yet, the usage matters. If your product is being heavily used by agents, you have a customer discovery problem the canonical playbook can't solve. This post is about what replaces it.

I don't think anyone has a clean answer to this yet. What I have are the methods I've been using with teams this year. Some of it is working. Some of it I expect to revise within six months. I'm publishing the working version so others can build on it.

The short version

Teresa Torres' canonical playbook (weekly interviews, Opportunity Solution Trees, assumption testing) assumes a human customer. In 2026, an increasing share of who actually uses your product is an agent, a Claude or GPT instance, an MCP-wired workflow, a custom orchestrator. Agents don't answer interviews. The replacement: six methods. Agent telemetry as primary signal. Failure-mode interviews with the humans who deployed the agent. Eval drift conversations. Synthetic user simulation. The "bossing the agent" interview. Contracts-as-discovery. Smallest first step: tag every API call by agent vs human, then run a 30-minute failure-mode interview with the human behind the agent of your three most agent-heavy customers.

For the canonical discovery playbook this builds on top of, see Continuous Discovery on Autopilot. For the eval drift signal feeding method 3, see The Eval-First Product Org. For the broader inbound replacement, see Kill the Feature Request Queue.

The four ways agents are using your product

To do discovery with agents, it helps to know what they're actually doing with your product. Four patterns, most products have some mix.

Pattern 1: The tool-user agent

The agent is using your product as a tool in an MCP or similar framework. It calls your API, parses the response, and decides what to do next. The end user behind the agent is a human who set up the agent, but the agent is the one actually interacting with your product.

Examples: a coding agent using your documentation API to look something up. A research agent using your data product as a source. A workflow agent using your notification system to alert users.

Pattern 2: The end-to-end replacement agent

The agent is not a tool user. It is the customer. The human behind it is at a higher level of abstraction. They set up the agent to handle a category of work, and the agent runs that work on your product without the human ever logging in.

Examples: an accounts payable agent that logs into your billing platform, pays invoices, and reconciles accounts. A customer support agent that handles tickets in your help desk platform. A lead qualification agent that works your CRM.

Pattern 3: The intermediary agent

The agent sits between the human and your product, translating, summarizing, or reformatting. The human interacts with the agent, which interacts with your product. The human's experience of your product is entirely mediated.

Examples: a meeting assistant that reads your calendar and meeting notes. A research assistant that reads your articles and synthesizes for the user. A voice interface that lets the user "talk to" your product through an intermediary.

Pattern 4: The hybrid agent-human session

The agent and the human collaborate inside a session. The human gives direction, the agent executes, the human reviews, the agent revises. Your product is being used by a combined entity where the agent and the human share the work.

Examples: an analyst working with a research agent to produce a report using your data. A designer working with a generation agent to produce variants using your design tool. A developer working with a coding agent to ship features using your platform.

Each of these patterns requires different discovery methods. None of them are well-served by "interview a user every week."

Six methods that work

Here are the six methods I've been using. They combine traditional discovery, agent-specific telemetry, and some new practices that feel right but haven't been proven out across many companies yet.

Method 1: Agent telemetry as primary signal

The single most useful thing you can do is instrument what agents are doing in your product at high fidelity. You want to be able to answer:

  • Which endpoints are being hit, in what sequence, by what kind of agent?
  • What's the distribution of request patterns across different agent types?
  • Where are agents hitting errors, rate limits, or ambiguous responses?
  • What's the tail of "unexpected usage" that doesn't match any documented pattern?

Most products in 2026 have some of this for API products, almost none for SaaS UIs that agents are driving. If you suspect agents are using your product, instrument the patterns you think they'd leave behind: rapid sequential requests, atypical navigation, high tool-use frequency, specific user agent strings.

This telemetry is your interview-every-week equivalent. The agents are "telling you" what's working and what's not, every time they use your product. You just have to listen.

Method 2: Failure-mode interviews with agent operators

You can't interview the agent. You can interview the human who set the agent up.

I call this a failure-mode interview because the most productive ones focus on where the agent didn't work well and what the operator did about it. Traditional user interviews ask "what are you trying to do." Agent operator interviews ask "what did the agent try to do, where did it fail, and what did you have to do to route around the failure."

A structure that's been working for me:

  1. Context-setting. What agent did you build? What does it do? What workflow does it run?
  2. Working surface area. Where does it interact with our product most? What's it good at?
  3. Failure surface area. Where does it fail? What do you have to manually correct? What do you tell it to avoid?
  4. Workarounds. What's the ugliest hack you've built to get around something our product does?
  5. Removal test. If we changed one thing tomorrow, what would most help your agent succeed?

Thirty minutes per operator. Five operators per quarter, minimum. The signal density is higher than a traditional user interview, because the operator has already done the abstraction work of turning their fuzzy needs into a specific agent configuration.

Method 3: Capability audits

For agent-driven products, you need to regularly audit what your product can and can't be used for by an agent. This is different from traditional usability testing. Usability testing asks "is this easy for a human?" A capability audit asks "is this coherent enough that an agent can reliably use it?"

A capability audit looks like this:

  1. Pick a workflow an agent might try to execute in your product.
  2. Describe the workflow to Claude or GPT with the same context an integrating customer would have.
  3. Let the model try to execute the workflow against your actual API or UI (using a sandbox account).
  4. Log every failure, every ambiguity, every retry, every hallucination.
  5. Classify: was the failure because of our product, our documentation, or the model's limits?

Run this monthly. Your product team learns where your documentation is too thin, where your error messages are too vague, where your UI relies on visual cues an agent can't parse, and where your workflow has implicit steps that aren't machine-readable.

This is one of the highest-leverage activities I know for products that have agent users. Two hours of audit reveals more problems than a week of user interviews.

Method 4: Pattern mining in open-source and Discord

The third-party agent ecosystem is public. Developers are building agents that use your product, and they're talking about it. GitHub issues on MCP servers. Discord servers for agent frameworks. Reddit threads. Twitter posts.

Your team should be mining this conversation weekly. Not as a substitute for direct customer research, but as a source of patterns you didn't know existed. What are people trying to build with your product? What are they hacking around? What integrations are they wishing for?

A minimum-viable version of this: set up a daily or weekly agent that searches the major forums for mentions of your product name, your API, or common error messages, and surfaces what it finds. Twenty minutes of human review per week. You'll learn things you wouldn't learn any other way.

Method 5: The agent benchmark suite

For products heavily used by agents, build a standing benchmark of agent tasks and measure your product's performance on them quarterly.

The benchmark is a set of 20 to 50 specific tasks: "find the customer with the most recent invoice over $10K." "Update a ticket with a note and change its status." "Export a chart from this dashboard as a PDF." Tasks that represent the real usage you see in telemetry.

Each quarter, run the benchmark. Score: can the agent do it without retries? Can it do it at acceptable latency? Can it do it at acceptable cost? Does it do it correctly?

Track the scores over time. A declining benchmark score is a signal that your product is drifting away from agent-usability. A rising score is a signal that your investments in API quality, documentation, and UI structure are paying off.

This is one of the most useful metrics a product org can own in 2026. I've seen teams put it on the same dashboard as their human usability scores.

Method 6: Discovery with end-user humans, through their agents

The human whose work is being automated by an agent is still a customer. They are harder to reach, because they don't directly use your product. But their experience of the outcomes still matters.

When you want to understand how a procurement team is experiencing the results of an agent that uses your invoicing system, you interview the procurement team about the outcomes, not about the product. "Are the invoices being processed correctly? Where do issues show up? How do you know when the agent did something wrong?"

This is a discovery interview at one level of abstraction higher than traditional user research. You're not asking about the product. You're asking about the quality of the work being done by software that uses the product. The answers are more about outcomes and failures than about features and experience.

The methods that don't work anymore

A few canonical methods need to be retired or heavily adapted for agent-driven products.

Usability tests with think-aloud protocols. An agent doesn't think aloud. Usability testing a human proxy who's simulating agent usage produces misleading results. Skip it for agent-facing surfaces.

Feature prioritization surveys sent to end users. If the end user never sees your product directly, a feature survey is asking them to evaluate an abstraction they don't have context for. Send feature prioritization to agent operators, not end users.

The classic "a day in the life" diary study. Useful for humans. Noise for agents. Agents don't have days. They have tasks.

NPS scores from end users. The end user's NPS is heavily colored by the agent's performance, which is only partly about your product. The score tells you less than you think.

A practical starting point

If you're running product at a company where agents are a growing share of usage, here's where to start this month.

  1. This week: instrument agent telemetry. Pick five signals that indicate agent usage. Get them on a dashboard. Start watching.
  2. Next week: book five failure-mode interviews. Identify five customers who've set up agents that use your product. Book 30 minutes each. Run the structure above. Write up the patterns.
  3. Within 30 days: run your first capability audit. Pick three representative workflows. Run them against Claude. Log every issue. Share internally.
  4. Within 60 days: set up the benchmark suite. 20 tasks, quarterly measurement, first baseline.
  5. Within 90 days: stand up the pattern mining. Daily or weekly agent that watches forums and surfaces what's being said.

At the end of 90 days, you have a discovery practice built for the actual shape of your 2026 customer base. The canonical human-interview practice still runs in parallel for the parts of your product that are still human-used. But you've now got signal from the part of your usage that was previously invisible.

One more thing

I want to flag a real risk. Many product teams will read "agents are using our product" and jump to optimization. "Let's make our product better for agents." This is sometimes right and sometimes wrong.

It's right when your customer values the agent's usage enough to pay for it, or when optimizing for agent usage also makes things better for humans. It's wrong when you optimize for agent-friendliness at the expense of the humans who are still your primary customer base.

The discovery methods above aren't optimization methods. They're understanding methods. You do them to know what's happening. The decision about whether and how to optimize comes after.

In some cases, the right answer is to build a separate agent-facing product or a specific agent API, priced differently, documented differently, sold to a different buyer. In other cases, the right answer is to keep your product focused on humans and let the agents do what they can with the existing surfaces.

Discovery first. Decision second. The canonical playbook had that order right. It still does.


The agent operator interview guide, the capability audit template, and the benchmark suite starter kit are all on the toolkit at falkster.com/toolkit.

Further reading

Share this post

Also on Medium

Full archive →

Frequently asked

Why does Teresa Torres' continuous discovery playbook break when the customer is an agent?+

Because every method in the playbook assumes a human you can interview. Agents don't have opinions, don't get frustrated, don't answer interview questions, and don't pattern-match on usability. The Opportunity Solution Tree assumes a customer who articulates needs in language. An agent expresses needs through API calls, retry behavior, and failure modes. The shape of the signal is fundamentally different.

What are the four ways agents use your product?+

Tool-user agents (calling your API as part of a multi-step workflow, e.g., a coding agent looking something up via your docs API). End-to-end replacement agents (the agent IS the customer; the human is at a higher level of abstraction, e.g., an AP agent that pays invoices through your billing platform). Intermediary agents (sitting between human and product, translating; e.g., a meeting assistant). And hybrid agent-human sessions (agent and human collaborating inside a session, e.g., analyst plus research agent). Each requires different discovery methods.

What replaces the weekly customer interview when the customer is an agent?+

Six methods. Agent telemetry as primary signal (instrumented at high fidelity). Failure-mode interviews with the humans who deployed the agent (where did it surprise you, where did you intervene). Eval drift conversations (what changed when our model updated). Synthetic user simulation (run scripted agent sessions against your endpoints). The 'bossing the agent' interview (watch a human direct the agent for 30 minutes). And contracts-as-discovery (what does the customer's agent guarantee its operator).

How do you instrument agent telemetry as a primary discovery signal?+

At high fidelity. You want to be able to answer: which endpoints does the agent hit, in what order, with what frequency. Which calls retry, which calls fail and what does the agent do next. Which tool calls cluster together (suggesting a multi-step workflow). Which sessions get abandoned mid-flight. The telemetry is your customer interview transcript when the customer can't talk.

What is a failure-mode interview and why is it different from a usability test?+

A failure-mode interview is with the human who deployed the agent, not the end user. Three questions: where did the agent surprise you (positive or negative); where did you have to intervene; what would you change about how it interacts with our product. This surfaces the gap between what the human intended and what the agent did, which is where most discovery gold lives in 2026.

Should we still do human interviews if our product is agent-heavy?+

Yes, but with the human-who-deployed-the-agent, not the end user. The deployer is the new customer in agent-heavy products. They make the buying decision, they configure the agent's tools and constraints, they live with the consequences when it misbehaves. Interview them like Torres would interview an end user, but ask different questions.

What's the smallest first step toward agent-aware discovery?+

Two things in the next two weeks. One: tag every API call by whether it's coming from an agent or a human (User-Agent header is a start, but you'll need richer signal). Two: identify three customers whose use of your product is at least 30% agent-driven, and run a 30-minute failure-mode interview with the human who deployed the agent. The first conversation usually resets your roadmap.

About the author

Falk Gottlob

Falk Gottlob

Product Executive · Founder, Falkster.AI

Thirty years shipping product at Microsoft Research, Adobe, Salesforce (Marketing Cloud / Quip / Slack), and several startups including one $6.5B exit and one acquired by Microsoft. Now CPO at Smartcat and founder of Falkster.AI, writing this notebook from the boardroom, not the keyboard.

Comments (0)

Sign in with LinkedIn to leave a comment.

Sign in with LinkedIn
  • Be the first to comment.

Keep Reading

Posts you might find interesting based on what you just read.