Direction Metrics for AI-Native Velocity

Outcomes lag by weeks. Direction moves with each iteration. The seven leading indicators that predict outcomes 4-8 weeks ahead, and the dual-cadence system.

Falk Gottlob2 min readNew

The companion essay Outcome Accountability Is a Luxury Good makes the case. This chapter is the practice.

The two-layer measurement system

LayerCadenceIndicatorsDrives
1 (Direction)Daily / weeklyThe seven leading indicatorsDay-to-day decisions: what to ship, what to roll back, what to evaluate further
2 (Outcomes)Monthly / quarterlyNRR, CSAT, NPS, expansion revenue, churn rate, customer satisfaction by cohortStrategic decisions: do we keep investing, are we pricing right, is the buyer changing

Both layers are deliberate. Both reviewed in different meetings with different audiences.

The seven leading indicators

One section per indicator with definition, source, healthy band, what it predicts, lag, common pitfalls.

  1. Eval pass rate over the last 7 days. Predicts: customer CSAT 4 weeks out.
  2. Agent quality score from sampled outputs. Predicts: NPS on successor 6 weeks out.
  3. Iteration count and shipped changes. Predicts: feature adoption 8 weeks out.
  4. Design coherence. Predicts: customer trust 6 weeks out.
  5. Customer escalation rate. Predicts: churn 12 weeks out.
  6. Dispute rate on outcome billing. Predicts: NRR 8 weeks out.
  7. Latency at p95 and p99. Predicts: retention 6 weeks out.

The Goodhart audit

Quarterly process. For each indicator, plot it against the outcome it predicts with the lag. Compute correlation. If above 0.7, healthy. If 0.5 to 0.7, weakening. If below 0.5, dead. Replace dead indicators.

What this changes about how PMs are measured

Two evaluation layers. PMs evaluated on the quality of their leading indicators (are they well-chosen, are they predicting outcomes, is the team's iteration cadence healthy). Plus the strategic outcome layer on annual or semi-annual cadence.

Outcome accountability moves from monthly to annual. Direction accountability becomes the day-to-day.

What to do this week

Build the indicator registry. YAML file. List the seven indicators (or your team's chosen seven), where the data lives, the healthy band, what each predicts. The agent that operationalizes this is at /blog/agent-direction-dashboard.


Share this post

Frequently asked

What are direction metrics?+

Leading indicators measured on the cadence of the work itself. For agent products: eval pass rate, agent quality score, iteration count, design coherence, escalation rate, dispute rate, latency. They predict outcome metrics on a 4-8 week lag and drive day-to-day decisions in a way outcomes can't.

Why do you need direction metrics in addition to outcomes?+

Because outcome cycles for AI features are 4-12 weeks (the data doesn't move faster than that). With agent products iterating 10-20 times per week, by the time an outcome attributes back you've shipped 40-80 more changes. Outcome accountability becomes a lagging measurement that can't drive day-to-day decisions. Direction metrics close the gap.

What are the seven leading indicators?+

(1) Eval pass rate over the last 7 days. (2) Agent quality score from sampled outputs. (3) Iteration count and shipped changes. (4) Design coherence (do agent outputs match the brief). (5) Customer escalation rate. (6) Dispute rate on outcome billing. (7) Latency at p95 and p99.

How do you prevent gaming?+

Treat leading indicators as derivatives of outcomes, not substitutes. Every quarter, audit whether each indicator is still correlating with the outcome it was chosen to predict. If correlation drops below 0.5 over 4 weeks, replace the indicator. Goodhart's law is real; the discipline is constant verification.

What is the two-layer measurement system?+

Layer 1 (daily/weekly): the seven leading indicators. Drives day-to-day decisions. Reviewed in standup and weekly outcome cohort review. Layer 2 (monthly/quarterly): customer outcomes (NRR, CSAT, NPS, expansion). Drives strategic decisions. Reviewed in monthly business review and quarterly strategic review. Both deliberate. Most teams use only one.

Related reading

Deeper essays and other handbook chapters on the same thread.