
SERIES · THE PM AGENT STACK · PART 2 OF 5
One connected system, not a set of independent tools. Read in order. The recipes here only land if the framing is in place.
- 1. Overview: the destination, the gap, the bridge
- 2. Discovery agent stack ← you are here
- 3. Build agent stack
- 4. Measure agent stack
- 5. The PM Agent Stack handbook chapter
The short version
This post is the concrete how-to for the Discover stage of the PM operating system. It is part 2 of 5 of the PM Agent Stack series. If you have not read the overview yet, start there. It sets up the destination (an enterprise-wide AI brain), the gap (why most companies are 6 to 18 months away), and why the bridge (this stack) is what to build today.
Discovery is the stage where most PMs feel the most leverage from an agent stack, and the stage where the smallest install gets you the most done. Seven repos, install order is roughly an hour, and the daily rhythm changes within the first week. The full recipe is below, including three specific workflows: real-time interview synthesis, weekly theme tracking, and signal scanning across sources you don't have time to read.
This is not a tour of features. It is a working list of what to install, in what order, and exactly how to use each piece on real PM work. Skim the install order, pick one of the three recipes, run it tomorrow.
The five PM workflows this stack actually handles
I'm naming them up front so you can decide whether the stack is worth building. If none of these are problems you have, skip the post.
Real-time interview synthesis. You drop a transcript in (or stream it during the call), the agent returns a structured note with themes, quotable evidence, and disconfirming signals. The Friday afternoon synthesis ritual goes away.
Cross-interview pattern detection. The agent reads the last N interviews and surfaces patterns the tags in your research tool didn't anticipate. The "I noticed X across the last three calls" insight that you would miss without it.
Signal scanning across sources. Support tickets, G2 reviews, App Store reviews, Reddit, your support inbox. The agent reads them on a schedule and flags new themes. You stop pretending you have time to read your support inbox.
Hypothesis-driven interview prep. Before each interview, the agent reads the prior research and proposes three hypotheses to probe. You walk in with sharper questions. The interview produces sharper answers.
Opportunity-solution tree maintenance. The agent reads your living OST and proposes additions, merges, and deletions based on the latest evidence. The tree stops decaying between formal review sessions.
If three of those five would meaningfully change your week, the stack is worth building.
The seven repos to install for Discovery
I'll list the repos in install order. Don't install all seven at once. Install the first three. Use them for a week. Then add the next four if the friction is still there.
1. anthropics/claude-code. The base. npm install -g @anthropic-ai/claude-code. Skip if you already have it.
2. anthropics/skills plus the customer-call-notes skill. The official skills repo gets you the docx, pdf, and pptx skills. The customer-call-notes skill (ships with the falkster.ai cowork mode setup, also lives on GitHub) is the one that does the real synthesis work.
3. github/github-mcp-server. The MCP that lets the agent reference your issues. Why is this a discovery tool? Because half the value of an interview is connecting the customer's pain to the issue you already had open about it. The agent does that connecting for you.
4. obra/superpowers. The brainstorm-spec-plan-TDD-review workflow. For discovery, the brainstorm slash command is the one that earns its install. Use it before designing an assumption test.
5. thedotmack/claude-mem. Long-term memory via compression. Discovery without memory is groundhog day. The agent forgetting last week's interview themes when you start this week's session defeats the entire compounding effect. Install before you feel the pain, not after.
6. microsoft/playwright-mcp. Browser automation. For discovery, Playwright lets the agent read public reviews, scrape forum threads, and pull from sources that don't have APIs. Use sparingly and respect site terms of service.
7. One subagent collection (I use wshobson/agents). Install three subagents specifically: a research synthesizer, a devil's advocate, and a customer-empathy critic. Run them against your synthesized themes. The devil's advocate alone catches half my motivated reasoning.
That's the seven. Total install time on a clean machine: 60 to 90 minutes. Weekly-use time: substantial.
Recipe one: real-time interview synthesis
This is the recipe I run most often. It's also the one that produces the largest visible change in week one.
Setup, before the call. Make sure Otter (or your transcription tool of choice) is recording. Have Claude Code open in your terminal with the customer-call-notes skill installed.
During the call. After the customer has been talking for ten or fifteen minutes, paste the running transcript into Claude Code with one prompt: "Summarize the patterns you see so far and propose three follow-up questions." Read the output during a natural pause. The output is for you, not for the customer. Use it to ask sharper questions in the second half of the call.
After the call. Paste the full transcript with one prompt: "Run the customer-call-notes skill on this. Compare to the last five customer-call-notes outputs in my notes folder. What is new and what is repeating?" The agent returns the structured note plus a delta against prior notes.
Save the output to your research repository. That repository becomes input for the next session. The compounding starts on day three or four when "compared to last week" surfaces patterns you would have missed.
What the customer-call-notes skill is doing under the hood: parsing the transcript into speakers, identifying problem statements (vs. solution statements), pulling quotable evidence, flagging assumptions, and structuring everything in a format that's diff-able against prior notes. The skill is twenty pages of careful prompt engineering. The output looks like a two-page note.
Recipe two: weekly theme tracking on autopilot
Once a week, the agent reads everything new since last week's review and reports themes.
Setup. Create a folder structure that the agent can read: research/calls/{date}-{customer}.md, research/tickets/{date}.md, research/reviews/{date}.md. Create a MEMORY.md index pointing to the folder. Configure claude-mem to compress and persist between sessions.
The weekly job. Set a recurring slash command (hooks/scheduled tasks, or just a Friday morning calendar reminder to run it manually) with this prompt: "Read all files in research/ added since {last_run_date}. Compare to the themes in research/themes.md. Report: 1) Themes that strengthened. 2) Themes that weakened. 3) New themes not yet in the file. 4) Potential weak signals worth probing in next week's interviews. Update research/themes.md with the deltas."
The output is two pages of structured prose. The most valuable section is consistently the "potential weak signals" one, because it surfaces things that wouldn't have crossed the threshold of any single interview but show up across three or four.
Run this for four weeks before judging. The cumulative effect is what matters. Single-week output looks neat but underwhelming. Month-three output looks like a research function.
Recipe three: signal scanning across sources you don't have time to read
This is the recipe that buys you the most calendar back.
Setup. Pick three to five public sources that talk about your product or category. G2 review page, your subreddit, an industry forum, App Store reviews, your competitors' support docs. For each source, write a one-paragraph "what to look for" prompt that names the products, the categories, and the signal types worth flagging.
The hook. Schedule a daily Playwright job that fetches new content from each source, runs it through the agent with the source-specific prompt, and writes a one-page digest to your inbox or Slack. Do not read every digest. Read the ones the agent flags as containing new patterns.
The honest truth. Two of the five sources will produce noise for the first week. Tune the prompts. By week three, two of the sources are producing one or two genuine insights per week, two are producing noise that you've learned to skim, and one has been retired. That's the right ratio. Five sources monitored continuously by an agent beats five sources you swear you'll read and never do.
A note on terms of service. Public review pages and forums have terms. Read them. Don't hammer the sources; one fetch per source per day is plenty. Don't republish content you scraped; use it as input to your synthesis only.
What stays human in discovery
The agent does not replace customer interviews. The agent does not decide what to discover. The agent does not write the opportunity-solution tree. The agent does not pick which assumption to test next.
What the agent does is take the mechanical 80% of discovery off your plate so the 20% that requires judgment, taste, and direct human contact gets your full attention. The PM still listens. The PM still notices the pause that meant something. The PM still walks the stakeholders through the synthesis and owns the call.
If you're tempted to outsource the listening to the agent, stop. The agent is an amplifier of your listening, not a replacement for it. Continuous discovery is a practice, and the practice still requires you in the room.
What to do this week
Pick one of the three recipes. Set up the matching subset of repos. Run the recipe on real PM work, not a test transcript. The point of the stack is to feel the leverage on the work you actually do.
If you're new to all of this, start with recipe one and the first three repos. If you've been using Claude Code for a while and want a step change, recipe three with all seven repos is the highest leverage move available.
The Friday afternoon synthesis ritual is the one to break first. Most PMs don't realize it's a ceiling until they go past it.
Build a Prototype-First Agent Stack
From PRD to working demo in a day, with TDD and security review baked in. Eight repos, three recipes. The post that changes how engineering reacts to your work.
Sources: anthropics/claude-code, anthropics/skills, github/github-mcp-server, obra/superpowers, thedotmack/claude-mem, microsoft/playwright-mcp, wshobson/agents. The full taxonomy and the credit to Divyanshi Sharma's Instagram carousel of the Claude ecosystem are in the handbook chapter.
Download the artifact
Ready to use. Copy into your project or share with your team.
Also on Medium
Full archive →AI Agents and the Future of Work: A Pixar-Inspired Journey
What product managers can learn about AI agents from how Pixar runs a film team.
How to Avoid Survivorship Bias in Product Management
Lessons from the British bomber study, applied to PM customer interviews and analytics.
Frequently asked
What does a discovery agent stack actually do?+
Three things. It turns interview transcripts into structured notes within minutes of the call ending. It scans support tickets, public reviews, and product analytics for emerging themes you would otherwise miss. It surfaces patterns across weeks of evidence so you walk into the next interview with sharper hypotheses. The compound effect is that customer signal stops being a Friday ritual and becomes an always-on input.
How is this different from just using Otter, Dovetail, or EnjoyHQ?+
Those tools store and tag. They do not synthesize. The discovery agent stack reads across stored evidence, finds patterns the tags didn't anticipate, and proposes the next research move. It complements the storage tools rather than replacing them. I still use Otter for transcription. The agent reads what Otter produced.
Do I need to be technical to set this up?+
Less than the README pages suggest. The discovery stack is mostly skills (markdown files) and one or two MCP connectors. The hardest part is getting Claude Code working in your terminal. Once that's running, the rest is reading installation commands and pasting them. A non-technical PM can stand this stack up in an afternoon.
Where does the customer data live and is it safe?+
It lives where you put it. By default Claude Code reads from your local filesystem. The MCP connectors talk to systems you authorize. Nothing leaves your environment unless you explicitly call a tool that sends it somewhere. Treat customer transcripts the same way you treat them today: redact PII before storing, follow your company's data handling policy. The agent doesn't change those rules.
How long until this earns its keep?+
Two weeks for me. Day five was when interview synthesis stopped being a Friday afternoon ritual. Day eight was when the agent surfaced a pattern across two weeks of calls that I had clearly missed. Day twelve was when I deleted the standing Friday block from my calendar.
What is the smallest version of this stack that's still useful?+
Three installs. Claude Code, the customer-call-notes skill (or Superpowers brainstorm), and the github-mcp-server (for tying observations to issues). That's enough to handle interview synthesis and a basic theme tracker. Add the rest only when you feel the friction the rest is meant to solve.