Product Builder OSHandbook§05 — Lead the ShiftChapter 6 / 11

PM-as-Editor: Managing a Fleet of Agents

The 39-agent chapter told you what to deploy. This is the skill you need once they're running: great editing at volume.

Falk Gottlob2026-04-18Updated 2026-04-247 min readNew

The skill you need once the fleet is running

The 39-agent chapter told you what agents to deploy. This chapter is about the skill you actually need once they're running: being a great editor.

Most PMs, faced with a pile of agent outputs, do one of two things. They trust the output blindly and ship it, which turns their inbox into a broken-telephone game. Or they rewrite every output from scratch, which is worse than doing the work themselves because now they're writing and paying the LLM bill. Neither scales.

The skill that scales is editing.

This is a different muscle from writing. It's the muscle senior PMs have always used on their own team's output (reviewing a PRD, redlining a memo, shaping a deck), now applied to non-human output at 10x the volume. If you can't develop this muscle, the agent fleet won't make you more effective. It will just make you tired in a new way.

The delegation-versus-verification ladder

Not every agent output deserves the same scrutiny. A mature fleet has a tiered trust model.

Tier 1: Ships without review. You trust the agent completely. It runs on a schedule, produces an artifact, delivers it. Examples: weekly customer signal digest, daily red-flag scan, release notes from a commit log. Trust tier. No human in the loop.

Tier 2: Ships after a one-minute sanity check. Agent produces, you skim, you accept or reject. Not editing the content. Verifying the output looks reasonable before it leaves your hands. Examples: stakeholder update, win/loss summary, roadmap one-pager.

Tier 3: Ships after real edit. You take the agent's draft and edit it meaningfully. Shaping framing, sharpening language, cutting, restructuring. Examples: board memo, crisis comms, anything read by a person whose opinion of you matters specifically.

Tier 4: Agent assists, you author. Agent produces a starter, output is so high-stakes or nuanced that you write the final version yourself. Examples: performance conversation, strategy shift announcement, customer apology for a real screwup.

Every agent in your fleet has a tier. Tiering is a product decision. Get it wrong in one direction (too much trust) and you ship garbage. Get it wrong the other way (too much review) and the fleet stops saving time.

How to move an agent up the ladder

New agents start at Tier 3: meaningful human edit every time. Over a few weeks I track: when I edit this agent's output, what am I consistently changing?

Two outcomes:

Prompt problem. I'm always changing tone, format, length, level of detail. Fix the prompt. Re-run. Re-evaluate. Agent moves up a tier.

Capability gap. Agent doesn't know about recent customer conversations, internal metrics, the context it would need. Give it access to the source. Re-evaluate. Agent moves up a tier.

Target: every agent from Tier 3 to Tier 2 within a month. Tier 2 to Tier 1 within a quarter. If an agent has been at the same tier for six months, I've stopped iterating. I'm using it as a crutch, not developing it as a system.

The metric I actually watch

The best operational metric for an agent fleet is: how much time am I spending editing agent outputs, total, per week?

Climbing: fleet is bloated and I'm reviewing outputs I shouldn't be. Audit tiers. Move things up or kill them.

Shrinking: I'm trusting agents more. Check eval scores. Trust built on vibes is a risk. Trust built on eval data is progress.

Stable for weeks: I've stopped improving the fleet. Schedule an agent review with myself. Find the three agents whose outputs need the most editing. Fix them.

The editing skill itself

Great editors do a few things well. None are mystical.

They know what "good" looks like for this artifact. Before they start, they have a clear mental model of the ideal output. Without it, you can't edit, you can only drift the output around. For PMs this means a shelf of exemplars: memos you admire, comms you'd copy, decks that worked. When editing an agent's draft, you're measuring distance from an exemplar, not reacting to the draft in a vacuum.

They cut more than they add. Non-editors add more context, caveats, detail. Editors cut. Agent outputs are often over-explained: redundant, over-hedged, too many transitions. Cut until the structure is visible. Cut again.

They ask what the artifact is for. Every artifact has a purpose: a decision to unblock, a customer to inform, a stakeholder to align. Editors keep asking "is this line serving the purpose?" If not, cut. If yes, sharpen.

They ship imperfect. Great editors know when the output is good enough and send it. Perfectionism is how agent fleets get re-absorbed into the PM's workload. If the agent's output is 80 percent as good as what I'd produce myself, and it took three minutes instead of three hours, ship. The customer doesn't experience the 20 percent gap. My calendar does.

What not to edit

There's a failure mode where PMs edit everything, including things that don't need editing, because editing feels like "doing the work." That's a performance, not a practice. Questions that kill this pattern:

Am I changing meaning, or rearranging words?
Would a reader notice the difference between my edit and the original?
Am I editing because the draft needs it, or because I need to feel useful?

If the answer to the third question is yes more than occasionally, I'm not editing. I'm resisting the loss of the old role. I notice it. I let it pass. I ship the draft.

The feedback loop that matters

Every edit I make on an agent's output is training data for the prompt.

After editing, I spend 60 seconds writing down: what did I change, why, and what prompt change would prevent me from having to make this edit next time?

After a week of this, I have a list of prompt improvements. I make them in a single session. Run the new prompt against the eval. Ship. Edit-time for that agent drops. Compound this weekly and the fleet gets sharper faster than competitors' teams can replicate.

The team version

When the whole team uses agent fleets, edit-time becomes organizational signal. The PM with the best-tuned fleet has lowest edit-time and highest output. That's the person who gets promoted, not the PM performing "thoroughness" by editing everything.

Make team edit-time visible. Not as surveillance. As a shared learning pool. When one PM figures out how to improve an agent, the whole team inherits the improvement.

Pick one thing this week

Pick the one agent output you edit most often. Do the 10-minute version of this practice.

Open the last three outputs from that agent. Copy them into a doc.
Redline them the way you normally would. Note every edit.
For each edit, write one sentence: "if the prompt said X, I wouldn't have needed to make this edit."
Rewrite the prompt with those changes.
Re-run the agent. Edit the new output. Measure whether your edit-time dropped.

Ten minutes of this replaces about an hour of editing per week. Do it once a month for every agent in your fleet and you're compounding. The PM job in 2026 is not to write more. It's to approve faster, with better calibration. If you're still writing everything yourself, you haven't promoted yourself into the new role.

Share this post

LinkedIn X BlueskyEmail

Frequently asked

What is the delegation-versus-verification ladder for agent outputs?+

Tier 1: Ships without review (you trust completely). Tier 2: Ships after one-minute sanity check. Tier 3: Ships after real editing (you meaningfully shape it). Tier 4: Agent assists but you author (highest stakes). Every agent has a tier. New agents start at Tier 3. Good editing moves them up.

How do you know when an agent is ready to move up a tier?+

Track what you consistently change in the output. If it's always tone, fix the prompt. If the agent is missing context, give it access to better data. If the output is fundamentally sound with only small tweaks, it's ready to move up. Target: Tier 3 to Tier 2 in a month, Tier 2 to Tier 1 in a quarter.

What is the most important metric for an agent fleet health?+

How much time you're spending editing outputs per week. Climbing means the fleet is bloated or agents are drifting. Shrinking means you're trusting them more. Stable for weeks means you stopped improving. The metric should trend down as your fleet matures.

What does great editing look like?+

Great editors know what good looks like (they have exemplars). They cut more than they add. They ask what the artifact is for and cut anything not serving that purpose. They ship imperfect. If the agent's output is 80 percent as good as what you'd produce yourself and took three minutes instead of three hours, ship it.

Why is feedback from edits training data for the prompt?+

After you edit, spend 60 seconds writing what you changed and why. After a week of this, you have prompt improvements. Make them in a single session. Re-run the eval. Edit-time drops. This compound improvement is how the fleet gets sharper faster than competitors can replicate.

PM-as-Editor: Managing a Fleet of Agents

The skill you need once the fleet is running

The delegation-versus-verification ladder

How to move an agent up the ladder

The metric I actually watch

The editing skill itself

What not to edit

The feedback loop that matters

The team version

Pick one thing this week

Frequently asked

Related reading

The PM Agent Stack: Open-Source Tools Mapped to PM Work

The PM Agent Stack: A Bridge to the Enterprise AI Brain

Why This Exists

Continuous Discovery

Continuous Listening: Every Customer, Every Day

Your Weekly Playbook

Audit, workshop, or advisory.

Follow on LinkedIn.

Browse the toolkit.