# Measurement Agent Stack — Starter Pack

Companion to [Build a Measurement Agent Stack](https://falkster.com/blog/build-measurement-agent-stack). Install commands, three ready-to-paste prompts for the morning brief / post-launch synthesis / stakeholder update writer, and the daily/weekly cadence.

Part 4 of 5 in the [PM Agent Stack series](https://falkster.com/blog/pm-agent-stack-overview).

---

## Install (a focused half-day)

```bash
# 1. Base
npm install -g @anthropic-ai/claude-code

# 2. postgres-mcp — point at a READ-ONLY role on a metrics replica
#    NEVER point this at production read-write credentials.
#    Install from github.com/crystaldba/postgres-mcp

# 3. A scheduling primitive
#    Either:
#    - The cowork mode scheduled-tasks tool (ships with the falkster.ai setup)
#    - System cron invoking Claude Code with a fixed prompt
#    - GitHub Actions on a cron schedule
#    Pick one and use it consistently.

# 4. playwright-mcp (for dashboards without API)
#    Install from github.com/microsoft/playwright-mcp

# 5. claude-mem (memory)
#    Install from github.com/thedotmack/claude-mem

# 6. continuous-claude-v2 (context-preserving ledger)
#    Install only if measurement work spans many sessions and context degrades

# 7. Subagents — install data-analyst and variance-detector specifically
#    From wshobson/agents marketplace
```

The base three (Claude Code, postgres-mcp, scheduled hook) earn their keep within a week. Add the rest after.

---

## Recipe 1 — The morning brief (8 a.m. daily)

### Setup once

Create a `metrics/` directory in your working folder:

```
metrics/
  queries/      ← one .sql file per metric
  history.json  ← rolling 90-day window of metric values
  README.md     ← what each metric means and why
```

Write a SQL file per metric:

```sql
-- metrics/queries/activations.sql
SELECT COUNT(*) AS activations
FROM users
WHERE activated_at >= CURRENT_DATE - INTERVAL '1 day';
```

### The daily prompt (7 a.m. scheduled)

```
For each query in metrics/queries/, run it and capture the result.

For each metric:
1. Compare to the same metric one day, seven days, and 30 days ago
   (read from metrics/history.json).
2. Flag any metric that has moved more than {THRESHOLD}% week-over-week.
3. For each flagged metric, look up the most recent shipped item in the
   living changelog (path: docs/changelog/) and propose an explanation.

Update metrics/history.json with today's values.

Write a one-page Slack message:
- Today's numbers (compact table)
- Week-over-week deltas (only flagged ones)
- Brief explanation for each flagged variance
- A "nothing notable today" line if the brief would otherwise be empty

Send to channel #my-morning-brief.
```

**Threshold tuning:** start at 3% week-over-week. Tune up to 5–7% after the first week of false positives. Settle into a threshold that produces 1 to 2 flags per week.

---

## Recipe 2 — Post-launch impact synthesis

### Run after a launch (replaces a 3-day analysis with a 20-minute one)

```
I launched feature {FEATURE_NAME} on {LAUNCH_DATE}.

Cohorts:
- Success cohort: users defined by {SQL filter for users who hit the new feature}
- Control cohort: matched users with {SQL filter for control}

Run the standard impact analysis using postgres-mcp:
- Per-cohort means on activation, retention day-1, retention day-7,
  conversion to paid
- Plus these feature-specific metrics: {METRIC_1}, {METRIC_2}, {METRIC_3}
- Compute deltas with 95% confidence intervals

Compare results against the eval-as-spec test cases at
{path/to/feature/eval-suite.md}.

Report:
1. Did we hit the bar? (case-by-case from the eval suite)
2. Magnitude of the effect (percentage and absolute)
3. Confidence intervals
4. What surprised you (segments where the effect was bigger or smaller
   than predicted)
5. Recommended next move (extend, kill, or iterate)
```

You edit the synthesis, add the narrative voice, and ship as the post-launch readout.

---

## Recipe 3 — The stakeholder update writer

### Setup once

Create `templates/weekly-update.md` with your standard sections:

```markdown
# Weekly Update — {WEEK ENDING}

## Numbers
- Metric 1: {value} (Δ {delta} vs. last week, Δ {delta} vs. 4-week avg)
- Metric 2: ...

## What shipped
- ...

## What's next
- ...

## What's at risk
- ...
```

### Friday 3 p.m. scheduled prompt

```
Read templates/weekly-update.md.

Pull this week's numbers using metrics/queries/, computing deltas vs.
last week and 4-week average.

Pull this week's shipped items from the living changelog (docs/changelog/).

Pull at-risk items from the anti-backlog (docs/anti-backlog.md).

Draft a stakeholder update following the template.

Voice constraints:
- Practitioner, not PR
- Short sentences
- No hedging, no "we believe", no "fundamentally"
- No "leverage" as a verb
- Lead with the numbers, never bury them
- Acknowledge surprises directly

Output as docs/updates/{date}-weekly.md.
```

You get a 70%-finished draft at 3:15. Spend 30 minutes adding the narrative voice. Update goes out at 4 p.m. instead of 6 p.m.

---

## Daily/weekly rhythm

| When | What |
|---|---|
| Daily 7 a.m. | Morning brief auto-runs; check Slack at 8 |
| After each launch | Recipe 2 within 24 hours of the launch |
| Friday 3 p.m. | Stakeholder update auto-drafts |
| Friday 3:30 p.m. | You edit + add narrative voice; ship by 4 p.m. |
| Monthly | Tune thresholds based on false-positive rate |

---

## Critical limitations

- The agent only sees data you give it access to. Cross-team metrics outside your scope are invisible.
- Read-only credentials only. Never point postgres-mcp at production read-write.
- Variance detection is not causal analysis. The agent flags WHAT changed; you decide WHY.
- Memory works for one PM's filesystem only. Each PM rebuilds.
- Browser scraping respects rate limits and BI tool terms of service. Prefer APIs when they exist.
- The agent does NOT replace the data team. It augments. When the analysis gets above the surface level, loop in a data analyst.

**Sources:** [Measurement agent stack post](https://falkster.com/blog/build-measurement-agent-stack) · [The impact loop](https://falkster.com/os/impact-loop) · [The living changelog](https://falkster.com/os/the-living-changelog) · [The eval is the spec](https://falkster.com/os/the-eval-is-the-spec)
