The 20-Minute Customer Call Triage Agent

This is the customer call triage agent every PM I talk to wishes existed and most do not realize they can build today.

The job is simple: take a 60-minute customer call, return a tagged and clustered artifact, recommend a next action. The whole loop should take 20 minutes of human time on top of about 90 seconds of agent runtime.

The customer call triage agent is the highest-ROI agent a PM can build in 2026, and the pieces are sitting in your workflow already. The only missing piece is the recipe.

The short version

Stack: any transcript source (Granola, Fireflies, Gong) + Claude with the customer-call-notes skill + whatever discovery ledger you already use. Inputs: a single call transcript. Outputs: a verbatim quote bank (8-15 quotes with timestamps), a theme rollup (3-5 themes with frequency), a one-line recommended next action, and a row formatted for the discovery ledger. The agent runtime is 90 seconds. The human review is 18 minutes. Eval: quote accuracy, theme legibility, action specificity. The highest-ROI use is running it across your entire backlog of unprocessed calls in a batch, which usually takes about four hours.

For the broader practice this slots into, see the handbook chapters Interview Guide and Continuous Listening. For what the agent stack looks like at scale, see The PM Agent Stack.

The recipe

Inputs

One call transcript. Plain text or markdown.

If your transcript tool exports JSON or has speaker labels, leave them in. The agent uses them. If you only have a flat text dump, that works too, but speaker attribution will be weaker.

Three pieces of context the agent needs (write them into a short brief file the agent reads):

The customer's role and the account context (tier, stage, recent renewal status)
Three to five things you are currently considering shipping (so the agent can connect quotes to active bets)
One sentence about what you want out of the call (not "everything," something specific)

If you skip the brief, the agent reverts to generic synthesis. Generic synthesis is not useful.

The agent

Claude, with the customer-call-notes skill loaded. The skill provides the structured output template. If you do not have access to the skill, the equivalent is a system prompt that names the four outputs you want.

The agent runs once per call. About 90 seconds.

Outputs

Four artifacts, in a single response.

1. Verbatim quote bank. Eight to fifteen quotes, with timestamps, organized by topic. Verbatim, not paraphrased. If the agent paraphrases, the artifact decays fast and you lose the ability to point at evidence in a planning meeting.

2. Theme rollup. Three to five themes the call surfaced. Named, not just listed. Each theme has a frequency (how many quotes belong to it) and a confidence (how clear the customer was).

3. Recommended next action. One sentence. Concrete enough to put on the PM's calendar this week. Bad: "Follow up with the customer." Good: "Ship a 90-minute prototype of the bulk-export flow and send it to this customer's champion by Thursday."

4. Discovery ledger row. A pre-formatted entry that maps to whatever ledger format you use. Account, date, theme tags, opportunity link, owner. Drop into the ledger without re-typing.

Human review

Eighteen minutes.

Read the quote bank first. Spot-check three quotes against the transcript timestamps. Verbatim, not paraphrased. If even one is paraphrased, the whole quote bank is suspect. Re-run the agent with a stricter instruction.

Read the themes. Ask yourself: would a senior PM who did not sit on the call understand what this customer cares about from these three theme names? If not, rewrite the theme names in your own voice. The agent gets the clustering right and the naming wrong about 30% of the time.

Read the recommended action. Is it something you can do this week? If it says "investigate further," "circle back," or "schedule a follow-up," the agent did not produce an action. Rewrite it.

Drop the ledger row in.

The eighteen minutes are the work. The agent is a force multiplier, not a replacement.

The eval

Three checks. Run them on the first ten calls you process, then weekly.

Quote accuracy. Open the transcript. Pick three quotes from the bank. Find them at the timestamp the agent claimed. They must be verbatim. Below 95% accuracy across ten calls, the agent is not load-bearing.

Theme legibility. Print the theme rollup. Show it to a senior PM who did not sit on the call. They should be able to describe what the customer cares about in 30 seconds. If they can't, the themes are too generic or too granular.

Action specificity. Read the recommended action out loud. If it could apply to any customer call, the agent did not produce a useful action. Rewrite it or re-run with a tighter instruction.

The eval is the difference between an agent and a feature you turned on once.

The batch run

The highest-ROI use of this agent is not the next call. It is the backlog of unprocessed calls you already have.

Most PMs I talk to have six to twelve weeks of recorded calls they have never synthesized. Sales reviews. Customer success check-ins. Win-loss interviews that got recorded and forgotten.

Run the agent across the whole backlog in a batch.

Read only the recommended actions and the theme rollups. Skip the quote banks for now (you can pull them later when you need evidence).

Pull the three to five themes that recur across calls. These are your real opportunities, surfaced from data you already had.

Ship a prototype against the most common theme within a week.

Total human time on a 40-call backlog: roughly four hours, almost all of which is reading recommended actions and clustering themes across calls.

This is the use case that pays for the whole stack in one afternoon.

Where it breaks

Three failure modes worth naming.

Cross-talk and multiple speakers. If the customer side of the call has more than one speaker, especially if they are talking over each other, the speaker-attribution gets confused. The agent will assign quotes to the wrong person. Fix: use a transcript tool with strong speaker diarization (Otter, Fireflies, and Gong are all decent) and skim the speaker labels before running the agent.

Customer diplomacy. A customer being polite is not a customer giving you signal. The agent takes statements at face value. If the customer says "your product is great, we'd love to use it more," the agent will tag it as a positive signal. The PM has to know that "we'd love to use it more" usually means "we are not currently using it much." This is judgment the agent does not have.

Brand-new use cases. If the call covers a use case you have never briefed the agent on, the theme clustering reverts to generic categories ("product feedback," "feature request," "user experience"). Fix: when you encounter a new use case, add it to the brief file. The agent improves over time as your brief gets richer.

What to do this week

Pick the call you most recently had with a customer that mattered. Run the agent on it.

Read the four outputs. Spot-check the quotes. Rewrite the recommended action in your own voice.

Drop the ledger row in.

Then ask the harder question: how many calls just like this one are sitting on your hard drive unprocessed?

That is the batch run. That is the four-hour afternoon that pays for itself.

The full system prompt, the brief template, the four-output schema, and the eval rubric are in the downloadable recipe. For the full agent fleet this triage agent belongs to, see Your AI Agent Fleet.

Sources: Teresa Torres on the interview snapshot template, Carl Vellotti on Claude Code workflows, Marily Nika on AI-augmented research, Gong, Granola, Fireflies, Claude customer-call-notes skill.

The 20-Minute Customer Call Triage Agent

The short version

The recipe

Inputs

The agent

Outputs

Human review

The eval

The batch run

Where it breaks

What to do this week

Further reading

Download the artifact

Also on Medium

AI Agents and the Future of Work: A Pixar-Inspired Journey

How to Avoid Survivorship Bias in Product Management

Frequently asked

About the author

Comments (0)

Keep Reading

Ship Story: The Discovery Week I Ran With Three Agents and No Calls

Customer Discovery When Your Customer Is an Agent

Survivorship Bias in AI: Interview the Planes That Didn't Come Back

Continuous Discovery Doesn't Scale for AI-Native Products

Audits, workshops, advisory.

Follow on LinkedIn.

Browse the toolkit.