AI AgentsNew·Falk Gottlob··21 min read

The Seven-Agent Reset: A Lean PM AI Fleet, One Agent Per Stage

Starting over: the seven-agent PM AI fleet I'd build today. One per PM OS stage, each with named signals, a single surface, and a quarterly kill-switch test.

PM AI agentsagent fleet designminimum-viable fleetPM operating systemagent architecturelean fleetFalkster
Helpful?

I just published an honest accounting of the 39 PM AI agents I deployed across four product orgs in 80 days. The piece ends with one paragraph saying "if I started over today, I'd build seven agents and stop." That paragraph is the most-asked-about part of the piece, including by me. So I'm writing the design document.

This is the entire agent fleet I would build from scratch on April 28, 2026. Seven agents. One per stage of the PM operating system. Each agent has a name, a cadence, a kill-switch, and a single surface where its output lands. No agent in this fleet exists to be impressive. Every agent in this fleet exists to do one job well enough that a real human, on a real Tuesday, would notice within a week if you turned it off.

The short version

The seven-agent fleet replaces the typical 30-to-50-agent sprawl with one minimum-viable agent per stage of the PM operating system. The Sentinel watches overnight signals and produces a morning red-flag brief. The Listener runs weekly on Monday and turns the past week of customer interviews into a Jobs-to-Be-Done update. The Steward runs daily at 4pm and tells you tomorrow's three decisions. The Forge converts a one-page spec into a working prototype on demand. The Ship Brief fires on every production deploy and produces both the user-facing release note and the engineering rollback brief in one artifact. The Compass runs daily and reports the three outcome metrics that matter against last week's baseline. The Reflector runs Friday and produces next week's bets, this week's wins, and a monthly compounding-lessons doc.

The whole fleet is governed by one rule: every agent has a named human DRI who runs a kill-switch test once a quarter. If turning the agent off for a week produces no complaint, the agent gets retired. Reading the field research first will explain why this rule is non-negotiable.

Every agent has a named owner, a single signal stack, a single output surface, and a kill-switch test. Anything past that is decoration.

Before I tell you about the seven, here's what I'm leaving out

The thirty-something agents I am explicitly not building in this fleet. The rule for inclusion was simple: a named PM, on a named Tuesday, would feel the agent's absence within a week. Anything I couldn't honestly answer yes to went in the cut pile.

The cut list, briefly: anything that automated something a PM should be doing themselves once a week (calibrating priorities, choosing experiments, deciding what to kill); anything whose value collapses if the upstream artifact is fragile (OKR trackers when OKRs aren't actually load-bearing); anything that produced output for an audience that already had a perfectly good way to get the same answer (executives who already get a CFO update don't need a separate executive-report agent); and anything that depended on infrastructure I couldn't keep running with two hours of maintenance per month.

The result is a tighter design with fewer surface areas, fewer cross-cutting failure modes, and dramatically fewer "wait, what does this one do again?" moments.

A morning, a Tuesday, with the seven-agent fleet running

The fastest way to understand the fleet is to walk through what a typical Tuesday looks like with all seven agents online. The agents have to feel like a system, not a list, or this whole exercise has failed.

That's six of the seven agents firing on a Tuesday. The Listener is Monday-only, and The Reflector is Friday-only. Across the full week, the fleet produces about thirty-five distinct artifacts. None of them require you to remember to ask. Each one lands at the moment when its consumer is most likely to act on it. The shape of your day stops being "what should I do next" and becomes "the next thing already arrived in Slack and I just need to read it."

That's the compounding the 39-agent fleet didn't quite achieve. Volume of agents was never the bottleneck. Choreography was.


Now the seven agents in detail. Each gets a name, a stage, a cadence, a signal stack, an action, a surface, and a kill-switch.

1, The Sentinel (Sense)

The Sentinel is the first agent I'd build because it produces visible value in week one. The signal stack is intentionally narrow, five sources, no social media, because the failure mode is noise, not missed coverage. The brief fires at 8am sharp regardless of whether the PM is online; if they're traveling, the brief still ran and they read it on the plane.

The DRI is the head of product. The kill-switch test is a quarterly silent week. If no one in product, sales, or CS notices the brief is missing, retire it. In my experience this never happens, by week three the brief is the most-read thing in the PM's morning. The closest existing agent in my current fleet is agent-red-flag-detection; The Sentinel is its more focused successor.

2, The Listener (Discover)

The Listener is the only agent in the fleet that does not deliver to Slack. Discovery output deserves a living document, not a stream of briefs. A JTBD update gains value the more it accumulates against itself; a Slack post forgets last week.

The DRI is the lead PM running discovery. The cadence is deliberately slow because discovery itself is deliberately slow, the worst version of this agent fires daily and produces "insights" before any single interview has been digested. Resist that. The closest existing agent in my current fleet is agent-interview-synthesis, which runs per-interview rather than per-week; the weekly cadence is the change.

3, The Steward (Decide)

The Steward is the agent that surprised me most when I designed this fleet. Decision-staging is one of the highest-leverage automations in PM work, most PMs are not bad at making decisions, they're bad at noticing that a decision is due. The Steward handles the noticing.

It runs at 16:00 deliberately. Earlier in the day and the signal residue is incomplete. Later, and the PM is past the point of being able to prepare. 4pm gives one work block to do prep work, plus the option of overnight contemplation for any decision worth sleeping on. The DRI is the PM themselves. Its closest current cousin is agent-daily-focus, but the Steward inverts the time of day, daily-focus runs at 7am for "what to do today;" the Steward runs at 4pm for "what to prepare for tomorrow."

4, The Forge (Build)

The Forge is the only agent in the seven-agent fleet that is on-demand rather than scheduled. The cadence is "whenever a PM has a spec to discuss." Daily-firing prototype agents accumulate noise; on-demand prototype agents produce signal exactly when needed.

The DRI is the lead PM. The kill-switch test is harder here because the agent doesn't run on its own schedule, instead, the test is whether anyone has invoked it in the past two weeks. If the answer is no, the team stopped designing through prototypes, which is a worse problem than the agent dying. Investigate that, then retire or rebuild. This agent already exists in production form as agent-instant-prototype; see also the Ten-Day Dev Loop for the workflow it slots into.

5, The Ship Brief (Ship)

The Ship Brief is event-triggered, not scheduled. It fires on every production deploy and only on production deploys. The two-artifact design solves a real coordination problem: PMs are constantly asked "what shipped?" by both customers (via support, sales, marketing) and engineering (during incidents). Producing both views from the same source eliminates the discrepancy that creates panic during a rollback.

The DRI is the release manager or, in smaller orgs, the PM. Closest existing version: agent-release-readiness, which fires before deploy. The Ship Brief moves the timing to immediately after deploy, which is when the artifact actually lands in someone's workflow.

6, The Compass (Measure)

The Compass is the agent I most regret not having in production form earlier. It enforces the discipline of three metrics rather than thirty. The act of choosing which three is itself the highest-leverage product decision a team makes each quarter; the agent makes that choice visible and load-bearing every morning.

The DRI is the team's analyst or PM lead. Critically, the agent doesn't push notifications when status changes, it updates the pinned message and lets the team's morning rhythm drive consumption. Closest current cousins are agent-signal-to-ship and agent-kpi-watchdog; the Compass is more focused than either, by design.

7, The Reflector (Amplify)

The Reflector is the meta-agent. It is the only agent in the fleet whose primary input is the output of other agents. This is what makes the seven-agent fleet a system rather than a stack, each agent's output is reusable as another agent's input, and The Reflector closes the loop weekly.

The DRI is the head of product or the PM lead. The kill-switch test is simple: if a quarter goes by and the monthly lessons doc has no entries that surprised anyone, retire the agent because it's not learning. Closest current cousin is agent-retrospective-synthesis.

The seven agents at a glance

If you remember nothing else from this post, remember this table. It's the fleet.

Notice the pattern. Each agent reads from a defined and finite signal stack. Each agent acts in one verb that fits in a single sentence. Each agent surfaces to one place where its consumer is already paying attention. The complexity of the fleet sits in the choreography, not in any single agent's logic.

How the fleet compounds (and how the 39-fleet didn't)

The seven-agent fleet is a system because the agents read each other. The 39-agent fleet was a list because they didn't.

The compounding loop is the reason for the design. Walk it forward.

The Sentinel reads overnight signals at 8am. The Compass reads metrics at 9am. The Steward reads the day's signal residue at 4pm. The Ship Brief fires whenever a deploy lands. The Forge fires whenever a PM has a spec to discuss. The Listener runs every Monday on a week's worth of customer voice. The Reflector reads everything else every Friday.

Each agent's output becomes another agent's input, sometimes immediately, sometimes a week later. The Steward's "today's three decisions" gets retroactively scored by the Reflector on Friday, were the three actually made? Did they move what they were supposed to move? The Listener's weekly JTBD update is consulted by the Steward when the day's calendar includes a discovery review. The Sentinel's red-flag briefs over a quarter are aggregated by the Reflector into a "what kept breaking" trend.

The 39-agent fleet did almost none of this. Each agent ran in isolation, produced output, and forgot. The compounding happened in my own head, when I happened to read three briefs in the same morning and notice a pattern. With seven agents and one agent reading the other six, the compounding gets externalized into the Reflector's monthly doc, which then becomes a real organizational asset.

This is the non-obvious payoff of going from 39 to 7. The 39 fleet had more raw output per week. The 7 fleet has more memory.

Building it: the order of operations

If you're tempted to build all seven at once, don't. The order matters because each agent borrows infrastructure from the previous one, and because each agent has to earn the next one's existence by passing a kill-switch test.

Six weeks to a working seven-agent fleet, with quarterly kill-switch testing as the governance ritual that keeps the fleet from drifting back toward thirty-nine over the following year.

What I'm not promising

This fleet is not a substitute for being a good product manager. It is a substitute for the parts of being a product manager that are repetitive enough to deserve automation. The judgment work, deciding what to ship, which customers to listen to most, when to kill a feature, how to tell the story to the board, does not move. It just gets clearer because the noise around it gets quieter.

I am not promising that every team should build exactly these seven agents. The right number is probably between five and ten depending on your operating model. I am promising that the design principles port: one agent per stage of your operating model, one signal stack per agent, one surface per agent, a named DRI per agent, and a quarterly kill-switch test on every agent. Those five principles produce a fleet that compounds. Other principles produce something else.

I am also not promising that this fleet, built fresh today, will look the same in eighteen months. Models change, integrations consolidate, my own taste for which decisions to automate keeps drifting. The seven agents above are my April 2026 answer to "what would you build from scratch." A future post will be the November 2026 update, what survived, what didn't, and which agent number eight finally earned its way in.

Pick one thing to try this week: build The Sentinel. Don't worry about the other six until next week. The Sentinel is by far the cheapest agent to build, the easiest to integrate, and the one that produces the most visible lift in week one. Once it's running and the team has noticed it's running, the conversation about agent number two becomes much easier.

Sources and further reading: 39 PM AI Agents Deployed: What Stuck, What Died, and Why, Your AI Agent Fleet, The AI Product Operating Model, Agents vs. Workflows vs. Automations, The Impact Loop.

Share this post

Frequently asked

Why only seven AI agents instead of more?+

Because the leverage of an agent comes from how deeply it integrates into a single team ritual, not from how many agents are in the fleet. After running a 39-agent fleet across four product orgs, the data showed that 13 of the 39 agents were orphaned, never referenced from any other workflow. A seven-agent fleet, one per stage of the PM Operating System, captures the same compounding effect with a third of the maintenance burden and zero orphans by design.

What are the seven agents in this fleet?+

The Sentinel (Sense, daily 8am), The Listener (Discover, Monday weekly), The Steward (Decide, daily 4pm), The Forge (Build, on-demand), The Ship Brief (Ship, on every production deploy), The Compass (Measure, daily 9am), and The Reflector (Amplify, Friday weekly). Each agent maps to one stage of the seven-stage PM operating model, has a named DRI, consumes specific signals, takes one action, and lands its output on a single surface.

Which agent should a PM build first if they're starting from zero?+

The Sentinel. It runs overnight, scans Slack DMs, support tickets, and call transcripts for what changed, and produces a single 8am brief titled 'what to look at first today.' Most PMs reclaim 30-45 minutes of morning triage on day one and feel the lift before the agent is finished being polished. Build The Sentinel before you build anything else, even before you build a discovery agent.

What's the difference between the seven-agent fleet and a typical 'AI for PMs' setup?+

Typical setups stack tools, a transcription bot, an interview-synthesis tool, a roadmap exporter. Each tool sits in its own silo. The seven-agent fleet is architectural: each agent owns one stage of the PM operating system, runs autonomously on a defined cadence, and delivers into a workflow that humans already run. The tools are inputs, not the architecture. The fleet is the architecture.

How long does it take to build the seven-agent fleet?+

Three to six weeks if you're building it solo as a working PM. The first agent (The Sentinel) takes about a week to wire end-to-end including Slack delivery and a kill-switch. Each subsequent agent takes two to four days because the integration work compounds, once Slack is wired and the prompt scaffolding exists, you're mostly designing the signal set and the output shape. Budget the same number of hours for the first month of tuning as you spent building, because every agent gets at least one revision once a real user has read the output.

What signals should the Sense agent monitor?+

Five sources, in priority order: Slack DMs from CSMs, sales reps, and execs over the last 24 hours; Zendesk or equivalent support tickets at P0/P1 severity; Gong or call-recording transcripts from yesterday's customer-facing calls tagged as red flags; PagerDuty or incident-management alerts from the past 24 hours; and a configurable list of named accounts whose mentions trigger an alert regardless of channel. Skip social media monitoring at this stage, too noisy and adds little signal.

Where does each agent's output land, Slack, Notion, or somewhere else?+

Five of the seven agents deliver to Slack because that's where attention already is. The Listener delivers to a Notion doc (its output is a discovery artifact that lives across weeks, not a daily brief). The Forge delivers an artifact URL into a Slack thread tied to the spec. The Reflector delivers a Friday Slack post plus a monthly Notion doc that captures compounding lessons. The principle: pick the surface where the human consumer of the output already does the next action.

What is a 'kill-switch test' and why is it on every agent in this fleet?+

A kill-switch test is the practice of turning an agent off for a week and watching to see if anyone notices or complains. If no one notices, the agent isn't load-bearing and shouldn't be in the fleet. Every agent in the seven-agent reset has a named human DRI who is on the hook to run the kill-switch test once a quarter. This is the single highest-leverage governance practice for an agent fleet, without it, agents accumulate into dead weight. The test costs nothing and tells you the truth in seven days.

How is this related to the 39-agent fleet research piece?+

Directly. After running and writing about my 39-agent fleet (39 PM AI Agents Deployed: What Stuck, What Died, and Why), the seven-agent reset is the architecture I'd build next time. The 39-agent piece is the field research; this post is the design document that follows from it. If you read the research first, the seven-agent fleet's tradeoffs will land harder. If you read this first, the research piece will explain why each tradeoff was earned.

Keep Reading

Posts you might find interesting based on what you just read.