Build a Discovery Agent Stack: Continuous Customer Listening

SERIES · THE PM AGENT STACK · PART 2 OF 5

One connected system, not a set of independent tools. Read in order. The recipes here only land if the framing is in place.

1. Overview: the destination, the gap, the bridge
2. Discovery agent stack ← you are here
3. Build agent stack
4. Measure agent stack
5. The PM Agent Stack handbook chapter

The short version

This post is the concrete how-to for the Discover stage of the PM operating system. It is part 2 of 5 of the PM Agent Stack series. If you have not read the overview yet, start there. It sets up the destination (an enterprise-wide AI brain), the gap (why most companies are 6 to 18 months away), and why the bridge (this stack) is what to build today.

Discovery is the stage where most PMs feel the most leverage from an agent stack, and the stage where the smallest install gets you the most done. Seven repos, install order is roughly an hour, and the daily rhythm changes within the first week. The full recipe is below, including three specific workflows: real-time interview synthesis, weekly theme tracking, and signal scanning across sources you don't have time to read.

This is not a tour of features. It is a working list of what to install, in what order, and exactly how to use each piece on real PM work. Skim the install order, pick one of the three recipes, run it tomorrow.

The five PM workflows this stack actually handles

I'm naming them up front so you can decide whether the stack is worth building. If none of these are problems you have, skip the post.

Real-time interview synthesis. You drop a transcript in (or stream it during the call), the agent returns a structured note with themes, quotable evidence, and disconfirming signals. The Friday afternoon synthesis ritual goes away.

Cross-interview pattern detection. The agent reads the last N interviews and surfaces patterns the tags in your research tool didn't anticipate. The "I noticed X across the last three calls" insight that you would miss without it.

Signal scanning across sources. Support tickets, G2 reviews, App Store reviews, Reddit, your support inbox. The agent reads them on a schedule and flags new themes. You stop pretending you have time to read your support inbox.

Hypothesis-driven interview prep. Before each interview, the agent reads the prior research and proposes three hypotheses to probe. You walk in with sharper questions. The interview produces sharper answers.

Opportunity-solution tree maintenance. The agent reads your living OST and proposes additions, merges, and deletions based on the latest evidence. The tree stops decaying between formal review sessions.

If three of those five would meaningfully change your week, the stack is worth building.

The seven repos to install for Discovery

I'll list the repos in install order. Don't install all seven at once. Install the first three. Use them for a week. Then add the next four if the friction is still there.

1. anthropics/claude-code. The base. npm install -g @anthropic-ai/claude-code. Skip if you already have it.

2. anthropics/skills plus the customer-call-notes skill. The official skills repo gets you the docx, pdf, and pptx skills. The customer-call-notes skill (ships with the falkster.ai cowork mode setup, also lives on GitHub) is the one that does the real synthesis work.

3. github/github-mcp-server. The MCP that lets the agent reference your issues. Why is this a discovery tool? Because half the value of an interview is connecting the customer's pain to the issue you already had open about it. The agent does that connecting for you.

4. obra/superpowers. The brainstorm-spec-plan-TDD-review workflow. For discovery, the brainstorm slash command is the one that earns its install. Use it before designing an assumption test.

5. thedotmack/claude-mem. Long-term memory via compression. Discovery without memory is groundhog day. The agent forgetting last week's interview themes when you start this week's session defeats the entire compounding effect. Install before you feel the pain, not after.

6. microsoft/playwright-mcp. Browser automation. For discovery, Playwright lets the agent read public reviews, scrape forum threads, and pull from sources that don't have APIs. Use sparingly and respect site terms of service.

7. One subagent collection (I use wshobson/agents). Install three subagents specifically: a research synthesizer, a devil's advocate, and a customer-empathy critic. Run them against your synthesized themes. The devil's advocate alone catches half my motivated reasoning.

That's the seven. Total install time on a clean machine: 60 to 90 minutes. Weekly-use time: substantial.

Recipe one: real-time interview synthesis

This is the recipe I run most often. It's also the one that produces the largest visible change in week one.

Setup, before the call. Make sure Otter (or your transcription tool of choice) is recording. Have Claude Code open in your terminal with the customer-call-notes skill installed.

During the call. After the customer has been talking for ten or fifteen minutes, paste the running transcript into Claude Code with one prompt: "Summarize the patterns you see so far and propose three follow-up questions." Read the output during a natural pause. The output is for you, not for the customer. Use it to ask sharper questions in the second half of the call.

After the call. Paste the full transcript with one prompt: "Run the customer-call-notes skill on this. Compare to the last five customer-call-notes outputs in my notes folder. What is new and what is repeating?" The agent returns the structured note plus a delta against prior notes.

Save the output to your research repository. That repository becomes input for the next session. The compounding starts on day three or four when "compared to last week" surfaces patterns you would have missed.

What the customer-call-notes skill is doing under the hood: parsing the transcript into speakers, identifying problem statements (vs. solution statements), pulling quotable evidence, flagging assumptions, and structuring everything in a format that's diff-able against prior notes. The skill is twenty pages of careful prompt engineering. The output looks like a two-page note.

Recipe two: weekly theme tracking on autopilot

Once a week, the agent reads everything new since last week's review and reports themes.

Setup. Create a folder structure that the agent can read: research/calls/{date}-{customer}.md, research/tickets/{date}.md, research/reviews/{date}.md. Create a MEMORY.md index pointing to the folder. Configure claude-mem to compress and persist between sessions.

The weekly job. Set a recurring slash command (hooks/scheduled tasks, or just a Friday morning calendar reminder to run it manually) with this prompt: "Read all files in research/ added since {last_run_date}. Compare to the themes in research/themes.md. Report: 1) Themes that strengthened. 2) Themes that weakened. 3) New themes not yet in the file. 4) Potential weak signals worth probing in next week's interviews. Update research/themes.md with the deltas."

The output is two pages of structured prose. The most valuable section is consistently the "potential weak signals" one, because it surfaces things that wouldn't have crossed the threshold of any single interview but show up across three or four.

Run this for four weeks before judging. The cumulative effect is what matters. Single-week output looks neat but underwhelming. Month-three output looks like a research function.

Recipe three: signal scanning across sources you don't have time to read

This is the recipe that buys you the most calendar back.

Setup. Pick three to five public sources that talk about your product or category. G2 review page, your subreddit, an industry forum, App Store reviews, your competitors' support docs. For each source, write a one-paragraph "what to look for" prompt that names the products, the categories, and the signal types worth flagging.

The hook. Schedule a daily Playwright job that fetches new content from each source, runs it through the agent with the source-specific prompt, and writes a one-page digest to your inbox or Slack. Do not read every digest. Read the ones the agent flags as containing new patterns.

The honest truth. Two of the five sources will produce noise for the first week. Tune the prompts. By week three, two of the sources are producing one or two genuine insights per week, two are producing noise that you've learned to skim, and one has been retired. That's the right ratio. Five sources monitored continuously by an agent beats five sources you swear you'll read and never do.

A note on terms of service. Public review pages and forums have terms. Read them. Don't hammer the sources; one fetch per source per day is plenty. Don't republish content you scraped; use it as input to your synthesis only.

What stays human in discovery

The agent does not replace customer interviews. The agent does not decide what to discover. The agent does not write the opportunity-solution tree. The agent does not pick which assumption to test next.

What the agent does is take the mechanical 80% of discovery off your plate so the 20% that requires judgment, taste, and direct human contact gets your full attention. The PM still listens. The PM still notices the pause that meant something. The PM still walks the stakeholders through the synthesis and owns the call.

If you're tempted to outsource the listening to the agent, stop. The agent is an amplifier of your listening, not a replacement for it. Continuous discovery is a practice, and the practice still requires you in the room.

What to do this week

Pick one of the three recipes. Set up the matching subset of repos. Run the recipe on real PM work, not a test transcript. The point of the stack is to feel the leverage on the work you actually do.

If you're new to all of this, start with recipe one and the first three repos. If you've been using Claude Code for a while and want a step change, recipe three with all seven repos is the highest leverage move available.

The Friday afternoon synthesis ritual is the one to break first. Most PMs don't realize it's a ceiling until they go past it.

Next up · Part 3 of 5

Build a Prototype-First Agent Stack

From PRD to working demo in a day, with TDD and security review baked in. Eight repos, three recipes. The post that changes how engineering reacts to your work.

Sources: anthropics/claude-code, anthropics/skills, github/github-mcp-server, obra/superpowers, thedotmack/claude-mem, microsoft/playwright-mcp, wshobson/agents. The full taxonomy and the credit to Divyanshi Sharma's Instagram carousel of the Claude ecosystem are in the handbook chapter.

Build a Discovery Agent Stack: Continuous Customer Listening

The short version

The five PM workflows this stack actually handles

The seven repos to install for Discovery

Recipe one: real-time interview synthesis

Recipe two: weekly theme tracking on autopilot

Recipe three: signal scanning across sources you don't have time to read

What stays human in discovery

What to do this week

Build a Prototype-First Agent Stack

Download the artifact

Also on Medium

AI Agents and the Future of Work: A Pixar-Inspired Journey

How to Avoid Survivorship Bias in Product Management

Frequently asked

Keep Reading

Build a Measurement Agent Stack: End the Dashboard Hamster Wheel

The PM Agent Stack: A Bridge to the Enterprise AI Brain

Build a Prototype Agent Stack: PRD to Working Demo in a Day

Continuous Discovery Doesn't Scale for AI-Native Products

Audits, workshops, advisory.

Follow on LinkedIn.

Browse the toolkit.