
SERIES · THE PM AGENT STACK · PART 3 OF 5
One connected system, not a set of independent tools. Read in order. The recipes here only land if the framing is in place.
- 1. Overview: the destination, the gap, the bridge
- 2. Discovery agent stack
- 3. Build agent stack ← you are here
- 4. Measure agent stack
- 5. The PM Agent Stack handbook chapter
The short version
This post is part 3 of 5 of the PM Agent Stack series. It is the concrete how-to for the Build stage of the PM operating system. If you have not read the overview yet, start there. It sets up the destination (an enterprise-wide AI brain), the gap, and why this stack is what to build today.
The Build stack is the one most PMs are quietly afraid of. It involves Claude Code writing actual code, which feels like crossing a line if you've spent your career on the strategy side. The line is mostly imagined. PMs have always made artifacts: PRDs, mocks, decks. Prototypes are just artifacts with stronger truth-in-them. The stack below puts working prototypes within reach of any PM who can read code, not just write it.
Eight repos. A specific workflow that takes a problem statement to a working artifact in a day. TDD enforcement, design review, and a security pass built in so the prototype is credible to engineering, not just impressive on your laptop. Three concrete recipes, one of which you can run tomorrow.
The four PM problems this stack solves
The "I can't get on the engineering roadmap" problem. Strategy decks get debated. Working prototypes get reacted to. A prototype produces a sharper conversation in 20 minutes than three weeks of meetings.
The "engineering says my spec is ambiguous" problem. A working artifact is unambiguous. The prototype is the spec, in the eval-is-the-spec sense. Behavior demonstrated beats behavior described.
The "I want to test an assumption fast" problem. Most assumption tests don't need a backend or a real database. They need a believable front end with realistic flows. A PM with this stack ships an assumption test on Tuesday that would have taken engineering until next sprint.
The "I want to give the team something to push back on" problem. Senior engineers improve a draft faster than they generate one from scratch. A prototype gives them something to improve. The team's first contribution is not "let me think about it" but "what if we changed this." That's a different cycle.
If even two of those four are problems you have, the stack is worth building.
The eight repos to install for Build
In install order. Don't install all eight at once.
1. anthropics/claude-code. The base. The terminal CLI. npm install -g @anthropic-ai/claude-code.
2. obra/superpowers. The brainstorm-spec-plan-TDD-review workflow. The discipline of the workflow is the actual product. The skills are the supporting cast. Install with /plugin marketplace add obra/superpowers-marketplace and follow the README.
3. yamadashy/repomix. Packs your codebase into one AI-readable file. When you start a prototype that touches an existing codebase, repomix is how you give the agent the right context without scrolling through ten files. The XML output works particularly well with Claude.
4. nizos/tdd-guard. Automated TDD enforcement. The agent cannot skip writing tests. When it tries, the guard explains why it blocked and what to do instead. PMs who haven't done TDD before will resist this for a week. After two weeks, you'll never go back.
5. RonitSachdev/ccundo. Granular undo. Day-one install. The first time the agent does something destructive, you'll be grateful for action-level rollback.
6. patrick-ellis/design-review-workflow. Automated UI/UX design review. Responsive checks, accessibility checks, basic visual hierarchy. Catches a class of issues human review misses. Run before you put the prototype in front of anyone.
7. anthropics/claude-code-security-review. Official security review GitHub Action. Adds a security pass to your prototype. Catches injection-style issues, exposed credentials, and naive auth patterns. Prototypes that survive a security pass are credible to engineering. Prototypes that don't, aren't.
8. zilliztech/claude-context. Semantic code search MCP for big codebases. Skip if your prototype is greenfield. Install when you start prototyping inside existing codebases over 100k lines. Pulls relevant files without you scrolling through directory trees.
Total install time: two to three hours. Use weekly. The stack pays for itself the first time you ship a prototype that gets engineering excited rather than skeptical.
Recipe one: PRD to working prototype in a day
This is the recipe that has changed my work the most. I run it weekly.
Hour zero. Open Claude Code in a fresh directory. Use the Superpowers brainstorm slash command with one prompt: "I want to test the following assumption: {one paragraph}. The simplest prototype that would produce a useful signal is one that {two sentences}. Brainstorm three approaches with tradeoffs." The agent returns three approaches. Pick one.
Hour one. Use the Superpowers spec slash command with the chosen approach. The agent writes a spec: a one-page document with user flows, the data shape, the test cases, and the explicit out-of-scope list. Read the spec. Push back on anything wrong. The cost of fixing the spec at hour one is twenty minutes. The cost of fixing it at hour six is the whole afternoon.
Hour two. Use the Superpowers plan slash command. The agent breaks the spec into a TDD-ordered task list. Tdd-guard is now live. Approve the plan.
Hours three through six. The agent works the plan. Red, green, refactor, repeat. You watch the output, redirect when it goes off the rails (it will, twice), and read each diff before approval. This is where the PMs who refuse to look at code get in trouble. Look at the code. You don't have to write it. You do have to read it.
Hour seven. Run the design review workflow. Run the security review GitHub Action. Fix what the reviews flag. The prototype is now credible.
Hour eight. Demo to the team. Get reactions. Decide what to keep and what to throw away.
End-of-day artifact: a working prototype, a spec it conformed to, a test suite that passes, a design review report, and a security review report. Engineering's first reaction is "okay, this is real" instead of "let me think about it." That changes the cycle.
A note on what the prototype is for. It is not for shipping to production. It is for producing a sharper conversation. The decision after the demo is one of three: throw it away because the assumption was wrong, hand it to engineering as the spec for the real version, or extend it for a deeper assumption test. All three outcomes are wins relative to the world where you didn't have a prototype.
Recipe two: code review pass on someone else's prototype
This is the recipe for PMs who already prototype but want to raise the quality bar on their own work before asking for engineering's time.
Setup. Have your prototype in a git repo. Have repomix installed. Have the agent collection installed (specifically a code-reviewer subagent and a UX-reviewer subagent).
The pass. Run repomix on the prototype directory: repomix --output context.xml. Open Claude Code. Prompt: "Read context.xml. Run a code review pass through the code-reviewer subagent. Run a UX critique through the UX-reviewer subagent. Run the design review workflow. Run the security review action. Report all findings, ranked by severity."
The output is a five-page review. Most of it will be true. Some of it will be the agent being overly cautious. Read all of it. Fix the high-severity items, acknowledge the medium-severity items in the prototype's README, ignore the rest.
What you've done is given engineering a prototype that has already been peer-reviewed by an agent that does code review for a living. Engineering now spends their review time on the parts that require judgment, not on catching missing tests. The relationship gets better.
Recipe three: the spec-as-eval workflow
This is the advanced recipe. Skip it if you're new to the stack.
The premise of the eval is the spec is that the spec for an AI feature is the eval suite. You don't write a spec that says "the agent should be helpful." You write twenty test cases that demonstrate what helpful looks like. The agent's job is to pass the test cases.
The build stack lets a PM write the spec-as-eval directly. Open Claude Code. Use Superpowers brainstorm to generate test cases. Use the spec command to formalize them. Use plan and TDD to write a reference implementation that passes the cases. Hand the eval suite to engineering.
What engineering gets is not a spec doc. It's an executable spec: a directory of test cases that pass on a working reference implementation. They can rewrite the implementation without rewriting the spec. The PM owns the cases. The engineer owns the implementation. The boundary becomes much cleaner.
This recipe is the one that changes the PM operating model most. Once you've shipped a feature this way, the old write-a-doc-and-wait pattern feels archaic.
What stays human
The agent doesn't decide what to prototype. The agent doesn't pick which assumption is worth testing. The agent doesn't read the room when the prototype demo lands flat and someone needs to decide whether to keep going. The agent definitely doesn't ship to production on its own.
What the agent does is collapse the time between "I have an idea" and "I have something the team can react to" from weeks to a day. That collapse is the prize. The PM is still in the loop on every meaningful decision. The agent has just made the loop tight enough that more decisions get made and more assumptions get tested.
Read when not to use AI before you go too deep on this stack. There are real limits, and the limits matter more in Build than in any other stage of the PM OS.
What to do this week
Pick recipe one. Pick a real assumption you've been carrying. Block half a day. Walk through the recipe. End of day, demo to one teammate.
If you've never used Claude Code, run the Discovery how-to first. The Build stack assumes some comfort with the base tools. The Discovery stack is the gentler entry point.
The first prototype is the hardest. The second one is half the time. The fifth one is what changes how your team sees you.
Build a Measurement Agent Stack
End the dashboard hamster wheel. The morning brief that arrives at 8 a.m. The post-launch synthesis in 20 minutes instead of 3 days. Seven repos, three recipes.
Sources: anthropics/claude-code, obra/superpowers, yamadashy/repomix, nizos/tdd-guard, RonitSachdev/ccundo, patrick-ellis/design-review-workflow, anthropics/claude-code-security-review, zilliztech/claude-context. The full taxonomy and the credit to Divyanshi Sharma's Instagram carousel of the Claude ecosystem are in the handbook chapter.
Download the artifact
Ready to use. Copy into your project or share with your team.
Also on Medium
Full archive →AI Agents and the Future of Work: A Pixar-Inspired Journey
What product managers can learn about AI agents from how Pixar runs a film team.
Many AI Agents Are Actually Workflows or Automations in Disguise
How to tell agents from workflows from cron jobs, and why it matters for what you ship.
Frequently asked
What does a build agent stack actually do?+
It takes a PM from a one-paragraph problem statement to a working prototype, with the discipline of TDD, design review, and a security pass baked into the workflow. The point is not to ship to production. The point is to put a real artifact in front of the team in a day, so the team can react to something concrete instead of speccing in the dark.
Do I need to be a developer to use this?+
No, but you need to be willing to read code. Claude Code writes the code. You direct it, review the diffs, and decide when the prototype is good enough to share. PMs who can read but not write code get most of the value. PMs who refuse to look at the diff get into trouble.
What about the PMs at companies that don't allow PMs to code?+
Most policies cover production code, not prototypes on a personal branch in a sandbox repo. Read your company's policy. The right framing is 'I'm using Claude Code to produce a prototype that engineering will rewrite,' not 'I'm shipping unreviewed code.' The work product is a working artifact, not a merged PR.
Why insist on TDD enforcement and security review for a prototype?+
Because prototypes are how PMs build credibility with engineering. A prototype that passes a TDD discipline and a security pass survives the first conversation. A prototype that doesn't gets rewritten from scratch, and the PM gets the reputation of someone whose work has to be redone. Discipline scales the influence of the work, not just the work itself.
How long does this stack take to set up?+
Two to three hours if your dev env is clean. Most of the time goes into wiring up TDD enforcement and the design review workflow. The base (Claude Code plus Superpowers) takes 20 minutes. Add the rest in waves over a week, not all at once.
What is the smallest version that's still useful?+
Three installs. Claude Code, Superpowers (for the brainstorm-spec-plan-TDD-review loop), and repomix (so the agent has whole-codebase context). That's enough for one PM to take a problem to a working prototype. The full eight repos add the discipline that makes the prototype credible to engineering.