Product Builder OSHandbook§04 — AI CraftChapter 8 / 9

The PM Agent Stack: Open-Source Tools Mapped to PM Work

Build a personal PM agent stack from open-source Claude repos. 18 tool categories mapped to the 7-stage PM operating system, with install order and how-tos.

Falk Gottlob2026-05-04Updated 2026-05-1814 min read

SERIES · THE PM AGENT STACK · PART 5 OF 5 (CANONICAL)

This chapter is the canonical reference for the PM Agent Stack series. The four blog posts are the way in; this is the index they all point back to.

1. Overview: the destination, the gap, the bridge
2. Discovery agent stack
3. Build agent stack
4. Measure agent stack
5. The PM Agent Stack handbook chapter ← you are here

The short version

The destination for every product organization is one AI brain with read access to every system the company runs. Slack, email, calendar, meetings, documents, source code, dashboards, CRM, design files. All of it. Not parts. All. Full stop.

Most companies are 6 to 18 months from that destination because procurement, security review, and data governance move slowly and PMs are not the buyers for the platform layer. This chapter is the bridge: it maps 18 categories of open-source Claude repos to the 7 stages of the PM operating system (Sense, Discover, Decide, Build, Ship, Measure, Amplify), gives the install order, and points to three concrete how-tos: Discovery agent stack, Build agent stack, and Measure agent stack. The stack works on the inputs a single PM has access to. It does not replace the enterprise brain. It compounds while you wait for the enterprise brain.

This is one connected system, not a set of independent tools. Skim the chapter once. Bookmark the install order. Walk through one trailing how-to. Install five tools this week, not fifty.

The destination and the gap

The destination is an enterprise-wide AI brain. One agent with access to every Slack thread, every meeting transcript, every doc, every PR, every dashboard, every CRM record, every design file. When the agent has access to everything, it stops feeling like a tool and starts feeling like a colleague who has been at the company for years. The companies that have built this layer first are already pulling ahead in every category. R&D figured it out earliest. Go-to-market is catching up. G&A is right behind.

Most PMs are not there yet. Three structural reasons. Procurement and security cycles take 6 to 18 months for a system that touches this many sources. The decision happens in the office of the CTO, the CIO, or the COO; PMs are not the buyers. And many companies are scarred from earlier knowledge-management projects that consumed budget and shipped underwhelming. The scar tissue is residual; it still slows the rollout.

The bridge is what to build while the enterprise version is being procured.

The bridge is single-PM scope by design. The trade-off is intentional. Standing it up takes a week of evenings, no permission, no procurement.

Credit where due: most of the catalog below is adapted from Divyanshi Sharma's 20-part Instagram carousel mapping the Claude ecosystem. The PM lens, the install order, and the mapping to the 7-stage PM OS are the contributions of this chapter.

The 18 categories of open-source repos

The Claude open-source ecosystem has matured fast. Thousands of community repos extend Claude Code. They cluster into 18 categories. You don't need to know every repo, but you need to know which categories exist so you can find the right one when a problem hits.

The categories, with a one-sentence PM summary for each:

Awesome lists. Master indexes. Start here when you don't know what you're looking for. Best entry: hesreallyhim/awesome-claude-code.
Meta indexes. Lists of lists. Use when an awesome list isn't specific enough for your need.
Anthropic repos. First-party. Trust them. anthropics/claude-code is the base of the entire stack.
Official adjacent. First-party-maintained workflows like security-review and the github-mcp-server. Treat them as features.
Skill collections. Bundles of capabilities you bolt onto the agent. obra/superpowers is the foundation most PM workflows assume.
Markdown skill libraries. Skills shipped as standalone markdown files. Easy to fork and edit.
Agents. Roles with goals. A "frontend reviewer" or "user-research synthesizer." Drop-in expertise.
Subagents. Agents inside agents. Used for delegation, specialization, and parallel work.
MCP servers. Connectors that give the agent access to specific systems (GitHub, Postgres, browsers, your codebase).
Connect Claude / more MCPs. Specific integrations like Playwright, SQLite, browser automation.
Orchestration. Coordinates multiple agents working in parallel. Skip until single-agent friction is real.
Workflows. Codified ways of doing work. Brainstorm, spec, plan, TDD, review.
Memory. Cross-session persistence. The single biggest unlock once you outgrow per-session work.
Context engineering. Tools like repomix and context-priming that get the right info in front of the agent at the right time.
Slash commands. Muscle-memory invocations. /fix-issue 456, /security-review, /design-review.
Hooks. Inject behavior at points in the agent loop: pre-commit, post-edit, on-error, scheduled.
Guides. Prompt engineering, system-prompt patterns, agentic coding patterns.
Learning. Other practitioners' journeys. The most underrated category for PMs because workflow design beats tool count.

This chapter does not list every repo. The framing walk-through is in the overview post, and the three trailing how-tos go deep on the subset of repos that matter most at each PM OS stage.

Mapping the 18 categories to the 7-stage PM OS

This is the mapping that earns its keep. It tells you which categories of tools to reach for at which PM OS stage.

Sense. What's changing in the world, the market, the codebase, the user base. Tooling that matters: MCP servers (postgres-mcp, github-mcp-server), context engineering (repomix, claude-context), hooks for scheduled digests. The agent's job here is to flag changes worth your attention before you go looking.

Discover. What problem are we solving and for whom. Tooling that matters: memory (claude-mem, semantic memory), skill collections for synthesis (customer-call-notes, the continuous-listening chapter's recipes), Playwright MCP for scraping public reviews and forums, subagents that play the role of researcher, synthesizer, and devil's advocate. Full how-to in Build a Discovery agent stack.

Decide. What to do next, given everything we know. Tooling that matters: subagent collections (wshobson/agents, davepoon's collection) for tradeoff analysis, multi-agent orchestration (claude-flow) for consensus, slash commands for prioritization rituals. The agent runs the analysis; you make the call.

Build. From decision to working software. Tooling that matters: workflows (Superpowers brainstorm-spec-plan-TDD-review), context engineering (repomix, graphify), TDD enforcement (tdd-guard), security review (claude-code-security-review), design review (design-review-workflow), claude-code itself. Full how-to in Build a prototype-first agent stack.

Ship. From working software to in front of users. Tooling that matters: official adjacent (security-review GitHub Action, claude-code-action for PR reviews), hooks (pre-commit, pre-push), the living changelog practice, observability hooks.

Measure. Did it work. Tooling that matters: MCPs (postgres-mcp for direct DB access, playwright-mcp for dashboard scraping), scheduled hooks for recurring digests, slash commands for variance detection, memory for tracking metric drift over time. Full how-to in Build a measurement agent stack.

Amplify. Telling the story so the org learns. Tooling that matters: writing skills, the eval is the spec practice, the living changelog, agents that draft stakeholder updates from raw evidence. Falls through to the PM as editor workflow once you have a fleet doing the synthesis.

The mapping is rough on purpose. The same tool earns its keep at multiple stages. Memory matters everywhere. MCP servers feed Sense and Measure but also keep Discover honest. The point of the mapping is to tell you which categories to install first when you start building for a stage, not which to use exclusively.

The install order

Most PMs who try to build this stack make the same mistake. They install fifteen things over a weekend and then never use any of them because the cognitive overhead is too high to remember what each does. Here is the install order that actually works.

Week one. The base. Install Claude Code. Install the official anthropics/skills repo. Install the github-mcp-server so the agent can read your issues and pull requests. That's it. Use it for a week on real work. Do not install anything else until you have a feel for the base.

Week two. The first capability layer. Install obra/superpowers and try the brainstorm-spec-plan-TDD-review workflow on one real feature. Install one MCP server that connects to a system you actually use (postgres-mcp if you have a queryable DB, playwright-mcp if you want web automation). Install ccundo on day one so granular undo exists when you need it. The first time the agent does something destructive you will be grateful.

Week three. Memory and context. Install one memory tool (claude-mem is a good default). Install repomix so you can pack a codebase into a single context block when you need to. Pick one subagent collection (wshobson/agents is the default) and install three to five subagents that match your daily roles, not the whole catalog.

Week four. The PM-specific layer. This is where you start applying the trailing how-tos. Pick one of the three (Discover, Build, Measure) based on which stage of your work most needs the leverage right now. Walk through that post end to end. By the end of week four you have a working personal stack you actually use.

Months two and beyond. Add hooks for repeating workflows. Add slash commands for muscle-memory tasks. Read the guides. Subscribe to the awesome lists as feeds and check them monthly for new repos. Build your own skills when the existing ones don't fit your work.

This order is not arbitrary. The friction at each step compounds, and adding tools out of order is how PMs end up with stacks they don't trust.

Critical limitations of the bridge

The bridge has real limits. Naming them honestly is part of the deal. A chapter on a personal agent stack that does not list the limitations is selling, not teaching.

Single-PM scope. The personal stack only sees what one PM can give it access to. It cannot read Slack channels the PM is not in. It cannot see deals in the CRM the PM is not on. It cannot read meeting transcripts from meetings the PM did not attend. Cross-team patterns that span outside the PM's visible work are invisible to it. The enterprise brain solves this. The bridge does not.

No team-wide memory. Cross-session memory works for one PM's filesystem. It does not propagate across teammates. Each PM on the same team rebuilds the memory layer separately. There is no shared "what did we decide as a team" across the team without an enterprise platform.

Limited write access. The personal stack reads. The agent can be configured to write, but write actions outside the PM's own filesystem (sending email, posting in Slack, updating Notion) are fragile, require explicit per-tool setup, and have no audit trail by default. The enterprise version normalizes write actions across systems with auditing and approval flows. The bridge does not.

Security boundary is your laptop. Enterprise agents have audit logs, role-based access, retention policies, and data loss prevention. The personal stack inherits the security posture of the laptop and the source systems being connected. PII handling stays the PM's responsibility. Treat customer transcripts the way they should be treated today, not differently because an agent is reading them. Follow your company's data handling policy.

Tool drift. Open-source repos move fast. A repo that is hot today could be unmaintained in 18 months. The stack requires ongoing curation. Plan for it.

Cognitive overhead is real. Twelve installed tools is a lot to track. The first time the agent surprises a PM (in either direction), the trust drop costs a session of productivity. Add tools only when the friction the new tool solves is something you can name in a sentence.

Won't replace the data team. Or the security team. Or the legal team. The personal stack augments individual PM work. It does not refactor company governance, replace specialist functions, or solve organizational problems. Use it as a tool for a person, not as a substitute for an org.

If any of these limitations is a deal-breaker for your situation, this chapter is the wrong starting point. The right starting point is then to push your CTO or CIO toward the enterprise version directly. Read when not to use AI for the longer argument about real limits.

What stays human

Building this stack does not change what stays human. You still own the strategic call. You still decide which problems are worth solving and which are not. You still take responsibility for the outcomes the stack helps you ship. The agent is a force multiplier on the parts of PM work that are mechanical, not on the parts that require judgment, taste, and accountability. The when not to use AI chapter has the longer version of this argument.

What changes is the ratio. A stack like this lets a single PM operate at the throughput of a small team without losing the coherence of one person's vision. That's the prize.

What to do this week

Three concrete actions:

First, read the overview post, which sets up the destination, the gap, and the bridge with concrete outcomes and limitations.

Second, pick one of the three trailing how-tos: Discovery, Build, or Measure. Pick the one whose stage of PM work most needs help right now. Walk through it end to end.

Third, install Claude Code if you haven't, and run one real PM task through it tonight. Not a synthetic example. A real interview transcript, a real PR review, a real weekly digest. The point of the stack is to feel the leverage, and the only way to feel it is on real work.

The shared context layer is coming whether your company is ready or not. Build the personal version now. You'll be the person who already knows what to put on the enterprise version when it arrives. That is not a small thing. That is the difference between adopting AI-native ways of working and being adopted by them.

Series complete · Loop back to Part 1

Re-read: The PM Agent Stack overview

When you take this to a teammate, start them at Part 1. The overview is the framing piece (destination, gap, outcomes, limitations) that makes the recipes in Parts 2 to 4 land. Re-read or share.

Sources: hesreallyhim/awesome-claude-code, anthropics/claude-code, anthropics/skills, obra/superpowers, github/github-mcp-server, wshobson/agents, yamadashy/repomix, thedotmack/claude-mem, crystaldba/postgres-mcp, microsoft/playwright-mcp. The 18-category map is adapted from Divyanshi Sharma's Instagram carousel on the Claude ecosystem.

Share this post

LinkedIn X BlueskyEmail

Frequently asked

What is a PM agent stack?+

A curated set of open-source Claude tools (skills, subagents, MCP servers, hooks, slash commands) layered on top of Claude Code so the agent can act on your real PM work: customer transcripts, opportunity-solution trees, prototypes, dashboards, weekly digests, and the dozens of small synthesis tasks that fill a PM week.

Why build a personal stack instead of waiting for the enterprise version?+

Because you have work to ship this quarter. The enterprise version (one agent with read access to every system) is worth waiting for, but procurement, security review, and data governance push real rollout 6 to 18 months out. A personal stack closes the gap. Every skill you write and workflow you codify is institutional knowledge in the making for the day the enterprise platform shows up.

How long does it take to get a useful stack running?+

About a week of evenings to install the core five (Anthropic skills, one agent collection, one MCP server, one slash-command pack, a memory tool), then ongoing tuning. The install order in this chapter exists so you don't blow a week setting up tools and zero time using them.

Do I need to be technical?+

Less than you think. Claude Code is a terminal app, but you mostly type natural language at it. The repos in this chapter ship as install commands. The friction is shell setup and API keys, not coding. If you can run `npm install` and follow a README, you can run this stack.

What's the minimum viable PM agent stack?+

Claude Code, the official skills repo, the github-mcp-server, Superpowers, claude-mem for memory, and the customer-call-notes skill. Five installs. About 90 minutes if your dev env is clean. That's enough to handle interview synthesis, weekly digests, code review of prototypes, and cross-session memory.

How does this stack map to the 7-stage PM operating system?+

Each stage gets a different mix of tools. Sense and Measure lean on MCPs and scheduled-task hooks. Discover leans on memory, context engineering, and synthesis skills. Decide leans on subagents that run consensus and tradeoff analysis. Build leans on the Superpowers brainstorm-spec-plan-TDD-review loop. Ship leans on security and design-review hooks. Amplify leans on writing skills and the living changelog. The mapping section below has the specifics.

What happens when my company rolls out an enterprise version?+

You become the person who already knows what to put on it. Every skill, subagent, and memory pattern you've built transfers. The shared context layer is plumbing; the institutional knowledge of which workflows actually save PM time is the part that compounds, and it lives in the people who built personal versions first.

The PM Agent Stack: Open-Source Tools Mapped to PM Work

The short version

The destination and the gap

The 18 categories of open-source repos

Mapping the 18 categories to the 7-stage PM OS

The install order

Critical limitations of the bridge

What stays human

What to do this week

Re-read: The PM Agent Stack overview

Frequently asked

Related reading

The PM Agent Stack: A Bridge to the Enterprise AI Brain

Build a Discovery Agent Stack: Continuous Customer Listening

Build a Measurement Agent Stack: End the Dashboard Hamster Wheel

PM AI Agent Fleet: 39 Agents Mapped to the 7-Stage Operating System

Build a Prototype Agent Stack: PRD to Working Demo in a Day

Customer Discovery When Your Customer Is an Agent

Audit, workshop, or advisory.

Follow on LinkedIn.

Browse the toolkit.