ExecutionNew·Falk Gottlob··updated ·16 min read

How Your PMs Ship Their First Pull Request (and Why You Should Want Them To)

A step-by-step guide for product leaders on how PMs can ship code with AI tools like Claude Code and Cursor. Includes a four-level PR ladder, PLANNING.md templates, a PM PR review skill file, pitch docs for CTO buy-in, and a customer signal monitoring playbook. Built from real coaching engagements across B2B SaaS orgs.

product managementai-nativeoperating-modelexecutiontoolkitleadershiphow-topracticetemplatesAI
Helpful?

The short version

PMs are starting to ship their own pull requests. Not as engineers. As PMs taking back the surfaces they always had the most context on (copy, configuration, AI prompts, telemetry, small front-end changes), because AI finally made it possible to act on that context without a translation layer. The four-level PR ladder: copy and configuration first, then AI prompts on features you own, then small front-end changes, then telemetry and instrumentation. PLANNING.md files live in git next to the code. CLAUDE.md is your standing brief to the agent. Engineering throughput goes up, not down, because engineers reclaim deep-work hours on architecture. Tobi Lutke mandated this at Shopify in April 2025. If your engineering leader pushes back, show them the volume shift.

A CPO I advise at a mid-market enterprise SaaS company sent me a screenshot last week. One of her PMs had shipped a feature. Not spec'd it. Not tracked it through three sprints of engineering queue. Built it, tested it, shipped it to production, and wired up the analytics dashboard herself. Total elapsed time from hypothesis to live feature flag: a day and a half.

The PM is not an engineer. She is a senior product manager with a classic B2B SaaS background. What changed is not her. It is the surface area she is now expected to own.

If you lead product and you have not yet thought hard about what this shift means for your roles, your planning system, and your definition of "shippable," you are already behind. The pattern is not a forecast. It is already running inside orgs you compete with.

Some of the most visible signals:

  1. Tobi Lütke mandated at Shopify in April 2025 that reflexive AI usage is a baseline expectation, and that managers cannot request new headcount without first proving AI cannot do the job. That is not a productivity memo. That is an operating-model memo.
  2. Several Fortune 500 engineering orgs have quietly reported code output per engineer roughly doubling in the last year, with the bottleneck shifting from writing code to reviewing it.
  3. GitHub Copilot, Claude Code, Cursor, and v0 are now standard issue in most product-led companies I advise. The pattern of PMs pushing their own copy and configuration changes straight to production has moved from "interesting experiment" to "table stakes" in about eighteen months.
  4. The PMs who commit to writing their own small PRs are outperforming their peers on every measurable outcome I can track: time-to-first-insight, cycle time to shipped experiment, and, most interestingly, promotion velocity.

These are not engineers wearing PM hats. They are PMs, designers, and executives taking back the parts of the product they always had the most context on, because AI finally made it possible to act on that context without a translation layer.

I wrote about this shift in The PM Role Is Being Rewritten and The AI Product Engineer. This post is the operating manual: how to actually run the transition inside your org.

The new map: PM vs. Design vs. Engineering

Every PM asks me the same question in month one. Do I need to become an engineer?

No. The walls shifted. The disciplines did not merge.

Here is the part most people miss. Shipping code makes you a measurably better PM.

When you can test a copy change on the onboarding flow in an afternoon instead of writing a spec in October and hoping it gets picked up in Q1, your strategy gets sharper. Your customer interviews get faster feedback. Your product judgment compounds because you see consequences in hours, not quarters.

Nothing here replaces strategy, user research, or stakeholder alignment. Those skills matter more than ever, because the cost of building the wrong thing just dropped to near zero and the only remaining edge is knowing what to build. Shipping is additive. It amplifies every other PM skill you already have.

The old map. PM writes specs, tickets, prioritization, stakeholder meetings. Design creates mockups, prototypes, handoffs. Engineering owns all code, all deploys, all production monitoring. Thick walls between each role. PM could not touch code. Engineer waited for the spec. Designer handed off a Figma file and hoped the build matched.

The new map. PM zone expanded into: copy, configuration, AI prompts, planning docs in git, small front-end changes, feature flags, agent behavior contracts, and production monitoring. The surfaces where the PM always had the most context and the engineer was translating intent. Design zone expanded into: coded prototypes in v0 or Cursor, design system changes shipped directly, visual QA against real session replays. Engineering core concentrated on: architecture, infrastructure, security, complex logic, performance, agent orchestration, and code review. The hard problems. The ones that actually need senior engineering judgment.

This is not a threat to your engineers. It is a gift. The single biggest unlock I see in orgs that adopt this is engineering throughput. When PMs own their own copy tests, flag configurations, and prompt tuning, engineers get to spend deep-work hours on architecture, integrations, agent orchestration, and the actual hard systems.

The sharpest observation I have heard on this came from a CTO at a Series C AI company. When his PMs started building their own ideas, their specifications got sharper, because they now understood what the agent needed to execute. Sharper specs produced better agent output. The engineer's life improved because the PM moved closer to the work.

If your engineering leader tells you this will create chaos, show them the volume shift. When code output per engineer doubles, the bottleneck stops being authorship and starts being review. PMs absorbing the low-complexity, high-context work is part of how you survive that volume. I covered this bottleneck shift in detail in The New Org Chart for AI.

Planning in git: why the PRD had to die

This is the shift that generates the most internal pushback before it generates the most internal lift.

The old way. A PM writes a Google Doc or a Notion page. Shares the link. Comments scatter across three tools. A review meeting happens two weeks later. Engineers cannot find the spec during implementation because it is buried in someone's Drive folder. The doc drifts out of sync with the code. Three weeks post-launch, nobody can reconstruct why a given design decision was made, because the conversation lived in Slack threads that aged out.

The new way. The PM writes a markdown file. Pushes it to the same repository as the code. Engineers and coding agents both reference it directly. When you want to know why something shipped, git log shows you. Version control gives you diffs, history, and accountability.

The content is the same as a good PRD. The location changed. And the location matters, because it puts the spec where the builder, human or agent, can actually use it.

I argued for this exact shift in The PRD Is Dead. Here is what the replacement looks like in practice.

What a PLANNING.md looks like

Here is an abbreviated example for a feature I coached a PM through last month at a B2B workflow company. Full worked examples are in the downloadable toolkit.

# PLANNING: Smart Support Ticket Triage v1

## Problem
Customer support tickets at enterprise accounts take an average of 6.4 hours
to get assigned to the correct specialist. "Wrong team picked up my ticket"
is the #2 CSAT detractor theme for the last two quarters. Enterprise renewal
risk accounts cite support responsiveness as a top-three concern.

## Hypothesis
A triage agent that classifies inbound tickets by product area, severity,
and required expertise will reduce time-to-correct-assignment by 70%+ and
lift first-response CSAT by 10+ points on the enterprise cohort.

## Success Metrics
- Primary: time-to-correct-assignment drops 70%+ vs. control queue
- Primary: first-response CSAT on enterprise tickets lifts by 10+ points
- Guardrail: agent mis-classification rate stays under 5% on audited sample
- Guardrail: no increase in escalation-to-engineering rate

## Rollout
- 10% of inbound tickets, 3 weeks, queue-level flag
- Kill condition: mis-classification above 8% sustained over 200 audited tickets
- Expansion gate: enterprise first-response CSAT improvement of 7+ points at week 3

That markdown file replaces a 12-page Notion doc and two planning meetings. It is versioned, diffable, readable by the agent that implements against it, and absorbable in under fifteen minutes by any new engineer rotating onto the team.

CLAUDE.md: your standing brief to the agent

CLAUDE.md is a persistent instruction file in the project root. Claude Code and most agent harnesses read it at the start of every session. It encodes product context, coding standards for PM-scoped changes, review expectations, and exactly what the PM should and should not touch. You write it once and update as the product evolves.

This is where you encode judgment. Every principle you hold on naming conventions, error handling, user-facing copy standards, or accessibility baselines belongs in CLAUDE.md rather than in your head. Once it is there, every agent in your system inherits it.

You can start today without a terminal

Everything in this section works from GitHub's web interface. Create a repo. Click "Add file." Write your PLANNING.md in the browser. Commit. You just shipped a spec to git. No terminal required.

If you are interviewing for a senior PM role in the next six months, fork the toolkit repo and write a PLANNING.md for your case study. It is a stronger portfolio artifact than any PDF deck, and it signals you understand the direction of travel.

Toolkit artifact: the open-source PM planning system

A GitHub-ready folder structure you can fork and adapt. Built for CPOs and product directors deploying across an org, and for individual PMs piloting solo.

What is inside:

  1. PLANNING-TEMPLATE.md, a fill-in template with every section a good planning doc needs
  2. Two worked examples: a support triage agent with an explicit agent behavior contract, and a non-AI onboarding configuration change
  3. CLAUDE.md scaffold with product context, review checklist, and explicit PM-zone boundaries
  4. A team rollout playbook for product directors: how to deploy, measure, and iterate
  5. A pilot measurement template for your first sprint
  6. A weekly review cadence doc
  7. A planning review skill file that checks your doc and flags gaps before engineering sees it

Download the Planning Template


Want this rewired across your whole org? The Eval Infrastructure Workshop is two days onsite where your team leaves with a production-ready eval harness for your three highest-stakes AI features, built in Claude Code. See the Eval Workshop →


Your PM's first pull request, and how to earn engineering trust

This is where the pushback lives. It is also where the unlock lives.

Here is the ladder I coach PMs up. I built it after watching two pilot groups try to go too deep too fast and generate exactly the kind of PR-review thrash that gives engineering leaders an excuse to shut the whole thing down.

Level 1: Copy and configuration changes. Button text, error messages, onboarding tooltips, feature flag config, pricing page text. If it has a PM voice and an engineer is translating your words into the final string, you can own it. First PR: change two words on a CTA. Run the test. Report the lift.

Level 2: AI prompt changes on features you own. Agent system prompts, retrieval query templates, classification thresholds. These are already PM-authored in spirit. Put them in the repo, versioned, with a clear test plan.

Level 3: Small front-end changes. A new field on a form, a new surface for an existing component, a conditional render based on a feature flag. You are not architecting anything. You are composing with existing primitives and testing the composition.

Level 4: Telemetry and instrumentation. Adding an analytics event, wiring a metric to a dashboard, creating an alert threshold. This is where most PMs end up doing the most high-leverage work, because the gap between "we should measure this" and "we are measuring this" is where most feature bets go to die.

A PM who masters Levels 1 through 4 is more valuable than the same PM was a year ago by a factor of three to five. The engineers around them reclaim that leverage. I covered the full skill breakdown in The PM Role Is Being Rewritten.

Earning engineering trust

Two non-negotiables before your first PR.

  1. Read the engineering standards. If your team has a style guide, linting rules, or a review checklist, read it before you write anything. This is non-negotiable respect for the craft.
  2. Run the toolkit-level review before you open the PR. The PR review skill file below catches roughly 80% of the feedback an engineer would otherwise have to write in a review comment.

If you do those two things, your first PR lands as a signal of seriousness rather than a tax on the reviewer's time.

Toolkit artifact: the PM PR review skill file

A skill file, runnable in Claude Code or any agent harness, that reviews your PR before engineering sees it. It flags gaps in your test plan, missing telemetry, untouched documentation, and obvious violations of your team's standards.

What it checks: linked PLANNING.md and PR scope consistency, documentation and CHANGELOG updates, user-facing copy review, analytics event naming conventions, test coverage or written justification, and feature flag hygiene with kill conditions.

Download the PM PR Review Skill

What PMs should actually ship, and how to test it

Most PM training programs get this wrong. They teach the tools and skip the judgment.

The highest-leverage changes a PM can ship, ranked by ROI:

  1. Copy on high-traffic surfaces. A PM I worked with rewrote two sentences on the integration setup flow of a B2B product last month and cut time-to-first-value for a key enterprise persona by over 30%. The change took 45 minutes end to end, including the test setup. An engineer would have taken a week to pick it up in a normal sprint.
  2. Agent prompts for features you own. Tuning the system prompt on a support classification agent based on 40 hours of observing actual specialist behavior should be done by the PM who ran those sessions, not by the engineer two hands removed from the customer.
  3. Telemetry for your own OKRs. If you cannot see the metric you committed to, you are not in control of the outcome. Ship the instrumentation yourself.
  4. Onboarding and empty states. These are the surfaces where user intent is most fragile and where PM judgment is most valuable.

How to test it

The copy-change example above is not made up, and the rigor is the point. It ran as a holdout test, not an anecdotal A/B. Ten percent of newly provisioned workspaces saw the old flow. Ninety percent saw the new flow. Because the target segment was enterprise, the team ran it for four weeks to hit statistical power. The PM designed the test, pushed the feature flag, wrote the dashboard, monitored the cohort, and declared the result.

That is the loop you are optimizing for. PM context in, production signal out, no translation layer in between. I described a similar loop in Mob Prototyping, where the PM, designer, and engineer build together in one room.

Real-time customer signal monitoring

One of the highest-leverage habits I coach is building a live customer signal feed that the PM owns directly. Not a weekly email digest. A live feed, routed to Slack, with escalation rules the PM wrote and owns.

The old model is a weekly support meeting where a Customer Success lead reads out the top three tickets from the prior week. By the time a pattern gets named in that meeting, it has cost three to six weeks. In an enterprise motion where contracts are six and seven figures, that lag is expensive.

The new model: live routing, PM-authored escalation criteria, and a standing customer signal review every Monday where the PM walks through what they saw, what they acted on, and what they deferred. If they deferred, they say why.

If you already run agent-powered monitoring, the Red Flag Detection Agent and Product Health Agent are good starting points for the signal infrastructure.

Toolkit artifact: the "make the case" pitch doc

If you need to pitch this shift internally before you can run it, this is the doc. Written for a CTO or VP of Engineering audience, because that is the approver you need. Two variants: standard for unregulated environments, and enterprise-regulated with explicit SOC 2, HIPAA, PCI, FINRA, and GDPR controls.

Download the Standard Pitch Doc

Download the Enterprise Pitch Doc

Toolkit artifact: the customer signal monitoring playbook

The operational playbook for turning scattered customer signal into a live feed your PMs own.

What is inside: signal source inventory, routing rules, PM-authored escalation criteria template, the Monday customer signal review agenda, and the escalation-to-action loop.

Download the Customer Signal Playbook

A worked teardown: PM #1 vs. PM #2

A product org I advised last year ran two PMs on adjacent features of the same product. Same tooling, same customer segment, same engineering support. One PM ran the old model: wrote a spec in Notion, shared it around, waited for engineering cycles, reviewed PRs as an observer, and owned the go-to-market.

The other ran the new model: wrote PLANNING.md in the repo, paired daily with engineering, shipped her own telemetry, wrote her own agent prompts, ran her own cohort on a feature flag, and pushed the onboarding copy changes herself.

The second PM shipped a complete vertical cut of the feature roughly nine weeks faster than the first. Not because she was smarter. She was operating with less translation loss between context and code.

The first PM, to her credit, watched this closely, asked for a one-week immersion after launch, and is now running the same model on her next feature. That is the pattern I want every product leader to plan for. Not a mandate. A proof. Then a pull, not a push.

If you lead product and you have not yet decided what your version of this looks like, the single most valuable thing you can do this quarter is run the pilot. One PM, one surface, four weeks. Measure time-to-ship and cycle-time-to-insight. Report back to your CEO with the numbers.

The org chart is not going back.

One more frame before I stop. I spent two decades as an endurance athlete and I still race Ironman distances when my schedule permits. Every triathlete will tell you the same thing about what separates the podium from the pack. It is not the swim, the bike, or the run. It is the transition. T1 and T2. The place where you move between disciplines without losing time. That is exactly what this post is about. The transition zones between PM, design, and engineering used to cost weeks. AI-assisted tooling collapses them to hours. The teams that win from here will be the ones that treat those transition zones as a first-class skill to train.

Start training.

Sources: Claude Code, Cursor, GitHub Copilot, v0, Shopify.

Share this post

Download the artifact

Ready to use. Copy into your project or share with your team.

Download

Also on Medium

Full archive →

Keep Reading

Posts you might find interesting based on what you just read.