I am about to disagree, respectfully, with Marty Cagan, Dragonboat, Reforge, and most of modern product management.
The disagreement is narrow. The core idea of outcome accountability is correct: measure what customers experience, not what teams output. That's not in dispute. The dispute is about cadence.
Outcome accountability assumes you can measure customer outcomes within a meaningful decision cycle. For products that ship features every six to twelve weeks and observe customer behavior over months, this works. For products that ship agent iterations every day and observe customer outcomes over weeks, it doesn't. The cycle times don't match.
This essay argues for what I run instead, and why.
The short version
Outcome accountability is a luxury good. It works when you can complete an outcome cycle inside a single decision cycle. AI products iterate ten to twenty times a week. Outcome cycles for AI features still run four to twelve weeks. By the time an outcome attributes back, you've shipped forty to eighty more changes. Outcome accountability becomes a lagging measurement that can't drive day-to-day decisions.
At velocity, you measure direction with seven leading indicators on a fast cadence (eval pass rate, agent quality, iteration count, design coherence, escalation rate, dispute rate, latency). Outcomes still matter, but they live on a slower cadence (monthly, quarterly). The right model is two measurement layers, deliberately. Most teams use only one.
This is a respectful argument with the canon. Cagan was right for the products he was writing about. AI-native products are different.
Why outcome cycles are slow even when iteration cycles are fast
The mismatch is structural, not cultural. Three causes.
Cause 1: Customer behavior is a lagging signal. A customer who used your agent yesterday and had a bad experience doesn't churn today. They get frustrated. They use the product less. They mention it on a CS call three weeks later. They start an evaluation of a competitor four weeks later. They churn at renewal in three months. Every step in this chain is real and slow. You can't compress the chain by shipping faster.
Cause 2: Statistical significance takes time. If you ship a new prompt today and want to know whether it's better than the old one, you need enough customer interactions on each variant to call the difference real. For most B2B products, that's two to four weeks of data. You can't compress that by shipping ten variants per week; you just end up with ten variants none of which have enough data to evaluate.
Cause 3: Outcomes are diffuse. The customer's outcome (retention, expansion, NPS) is shaped by dozens of agent interactions plus their broader workflow plus the rest of their toolchain. Attributing a specific change to an outcome is hard. The cleaner the attribution, the slower it is.
These three causes are why outcome accountability has the cadence it does. Cagan didn't pick six to twelve weeks arbitrarily; that's the minimum cycle time for an outcome signal to be readable.
You can ship faster than the outcome cycle. You can't measure faster than the outcome cycle.
What "measuring direction" looks like in practice
If outcomes can't drive day-to-day decisions at velocity, something else has to. The frame I use is direction measurement: leading indicators that predict outcomes on a four-to-eight-week lag, measured on the cadence of the work itself.
For an agent product, the seven leading indicators that work are:
-
Eval pass rate over the last seven days. Are the prompts we shipped this week passing the evals we wrote last month? Trends matter more than absolute levels.
-
Agent quality score from sampled outputs. Each day, sample fifty outputs across customer cohorts. Have a senior engineer or product specialist rate them on a one-to-five scale. The score predicts customer satisfaction four to six weeks out.
-
Iteration count and shipped changes. How many distinct improvements did the team ship this week? Velocity is a leading indicator of long-term outcomes, with the caveat that velocity divorced from quality is noise.
-
Design coherence. Do the agent's outputs match the brief the team agreed on? This is fuzzy and human-judged. It correlates strongly with customer trust, which correlates with retention.
-
Customer escalation rate. When a customer hits "this is wrong, escalate to a human," they're voting on agent quality. The rate moves daily; outcomes (churn, NPS) move slowly. Escalation rate is the bridge.
-
Dispute rate on outcome billing. For outcome-priced agents, customers can dispute a charge ("the resolution wasn't actually a resolution"). Dispute rate is a real-time measure of trust in the agent's quality.
-
Latency at p95 and p99. Slow agents lose customers. Latency moves with infrastructure changes; it's the most stable of the leading indicators and the one that most teams underweight.
Each of these moves with each iteration. None of them is the outcome. Together they form a leading-indicator layer that drives daily decisions while outcomes slowly catch up on the slower cadence.
The two-layer measurement system
Layer 1 (daily and weekly): the seven leading indicators. The team reviews these in the daily standup and the weekly outcome cohort review (see the Dual Transformation Operating Model for the full cadence). Decisions about what to ship next, what to roll back, what to evaluate further get made off this layer.
Layer 2 (monthly and quarterly): customer outcomes. NRR, CSAT, NPS, expansion revenue, churn rate, customer satisfaction by cohort. The team reviews these monthly. Strategic decisions (do we keep investing in this product line, are we pricing right, is the buyer changing) get made off this layer.
Both layers are deliberate. Both are reviewed in different meetings with different audiences. The leading-indicator layer is the operational layer for the team. The outcome layer is the strategic layer for the leadership.
The mistake most product teams make is collapsing both into one layer. Either they only review outcomes (and starve the team of fast feedback), or they only review leading indicators (and lose the strategic check). The two-layer system makes both first-class.
The leading-indicator gaming problem
A skeptical reader will note: the moment you measure leading indicators, the team will optimize for them, even when optimizing them doesn't actually predict outcomes.
This is real. The fix is not to avoid leading indicators. The fix is to constantly verify that they still correlate with outcomes.
Every quarter, run this analysis: for each leading indicator, plot it against the outcome it was chosen to predict, with a four-to-eight-week lag. Is the correlation still there? If yes, the leading indicator earns its place for another quarter. If not, replace it.
Some leading indicators decay because the team has Goodharted them (optimized them in ways that satisfy the metric without producing the outcome). Some decay because the underlying customer or product has changed. Either way, the response is the same: replace the indicator.
Treating leading indicators as derivatives of outcomes (not as substitutes) is what keeps the system honest.
What this changes about how PMs are measured
Outcome accountability says PMs should be evaluated on the customer outcomes their work produced. The implicit time horizon is six to twelve months.
Direction measurement adds a second evaluation layer. PMs are also evaluated on the quality of the leading indicators in their area: are the metrics well-chosen, are they still predicting outcomes, is the team's iteration cadence healthy, is escalation rate trending right, is the eval suite improving.
For PMs working in fast-iteration agent products, the direction layer is where most performance is observable. The outcome layer is where strategic effectiveness is observable. Neither alone is enough.
This sounds like more measurement. It is. The compensating change is that strategic outcome accountability becomes annual or semi-annual, not monthly. PMs aren't chased on outcomes that haven't had time to attribute. They're held accountable on a cadence that matches the actual cycle time of the signal.
The respectful argument with Marty Cagan
Marty Cagan's Inspired and the broader SVPG corpus did more to elevate product management as a discipline than almost any other body of work since 2010. The shift from output to outcome focus was the right call for the era. PMs in 2018 were drowning in feature factories. Outcome focus was the rescue.
The argument here is narrower. AI-native products operate at a cadence that Cagan's frame, designed for SaaS at six-to-twelve-week feature cycles, did not anticipate. The frame still applies for the products it was designed for. For products that iterate ten times a week, an additional measurement layer is necessary.
I would expect Cagan, given his own pragmatism, to be open to the update. The field hasn't had this conversation explicitly yet. I'm offering this as a contribution.
If I'm wrong, the specific objection should be: which of the seven leading indicators doesn't predict its corresponding outcome, and what should replace it? Let's argue at that level.
What to try this week
Pick one product team. One product, one squad. Run a quick analysis.
- Write down the customer outcome the team is supposedly accountable for (retention, expansion, NPS, conversion, etc.).
- Write down the cadence at which that outcome is reviewed (monthly? quarterly?).
- Write down the cadence at which the team ships changes (daily? weekly? bi-weekly?).
- Subtract step 2 from step 3. That's the gap between when the team can see what they did and when they can decide what to do next.
If the gap is negative or near zero, outcome accountability still works. The cycle times match.
If the gap is large (the team ships ten times before they can read the outcome of any one ship), you have a measurement system mismatch. The team is operating in the dark between outcome reviews. They're either making decisions blindly or, more often, slowing themselves down to wait for the outcome data they can't actually wait for.
Direction measurement closes the gap. The seven leading indicators above are a starting point. Build the dashboard. Review weekly. Verify quarterly that the indicators are still predicting the outcomes you care about.
The team's pace and clarity will both increase. That combination is rare and load-bearing for any product moving at AI-era velocity.
The Direction Dashboard agent (an automated pipeline that compiles the seven leading indicators every day) is at /blog/agent-direction-dashboard. The companion handbook chapter on direction metrics for AI-native products is at /handbook/direction-metrics.
Further reading
Frequently asked
What is outcome accountability?+
The product-leadership consensus that PMs should be measured on customer outcomes (retention, expansion, NPS, conversion) rather than outputs (features shipped, sprints completed). Marty Cagan, Reforge, Dragonboat, and most modern product orgs have adopted this frame since 2018. It is correct for products with measurable outcome cycles of 4-12 weeks.
Why is it a luxury good?+
Because it assumes you can measure outcomes within a meaningful decision cycle. AI products iterate 10-20 times per week. Outcome cycles for AI features are still 4-12 weeks (the data doesn't move faster). By the time an outcome attributes back, you've shipped 40-80 more changes. Outcome accountability becomes a lagging measurement that can't drive day-to-day decisions.
What is direction measurement?+
Leading indicators measured on the cadence of the work itself. For agent products: agent quality scores, eval pass rate, iteration count per week, design coherence (how well the agent's outputs match the team's intended behavior), latency, escalation rate, dispute rate. These move with each iteration and predict outcomes 4-12 weeks ahead.
Are you saying outcomes don't matter?+
No. Outcomes still matter. They just can't be the only measurement layer. The right model is two layers. Leading indicators on a daily/weekly cadence drive day-to-day decisions. Outcomes on a monthly/quarterly cadence drive strategic decisions. Both are deliberate. Most teams use only one of the two.
What are the seven leading indicators?+
(1) Eval pass rate over the last 7 days. (2) Agent quality score from sampled outputs. (3) Iteration count and shipped changes. (4) Design coherence (do the agent's outputs match the brief). (5) Customer escalation rate. (6) Dispute rate on outcome billing. (7) Latency at p95 and p99. Each predicts an outcome metric on a 4-8 week lag. Together they form the direction layer.
Is this an attack on Marty Cagan?+
No. Cagan's outcome focus rescued product management from feature-factory thinking. His frame holds for products at the cadence he was writing about. The argument here is that AI-native products operate at a velocity Cagan's frame did not anticipate. Cagan would likely agree. The field hasn't done the update yet.
What about leading indicator gaming?+
Real risk. The fix is to never optimize the leading indicators alone. Every quarter, check whether the leading indicators are still predicting the outcomes they were chosen to predict. If correlation breaks, replace the indicator. The leading indicator layer is a derivative, not a substitute, for outcomes.