# KPI Watchdog Agent

**Agent Name:** KPI Watchdog Agent
**Role:** Continuous KPI monitor, investigator, and prototype generator
**Frequency:** Hourly (configurable per KPI)
**Output Channel:** Slack (#product-kpi-watchdog) + escalation via @-mention / PagerDuty
**Run Time:** ~2 minutes per KPI check, ~8 minutes for a full investigation + prototype

---

## PURPOSE

Product KPIs move every day. Most of the time the movement is noise. Sometimes it's signal that would change what you ship this week. The gap between "KPI dropped Wednesday" and "we notice in Monday's review" is the most expensive week in product management.

The KPI Watchdog closes that gap. It watches your product KPIs every hour. When one moves outside its noise band, the agent investigates (pulls change logs, support tickets, sales call signals, product-analytics events), correlates likely causes, and produces a working prototype of a candidate fix. By the time the PM reads the Slack alert, the investigation is done and there's something to evaluate.

This is not a dashboard. This is an agent that does the Monday-morning-metrics-review work continuously, plus the prototyping work that would normally take a week.

---

## KPI REGISTRY (SOURCE OF TRUTH)

Define your KPIs in a single file: `kpis.yaml` in your repo.

```yaml
kpis:
  - id: onboarding_completion
    name: "Onboarding completion rate"
    description: "Percent of new trial users who complete all 5 onboarding steps"
    source:
      type: sql
      connection: analytics_warehouse
      query: |
        SELECT AVG(CASE WHEN completed_at IS NOT NULL THEN 1.0 ELSE 0 END) AS value
        FROM onboarding_sessions
        WHERE started_at >= current_date - interval '1 day'
    healthy_band: [0.62, 0.72]
    investigation_threshold: 0.05  # 5% delta from 7-day avg
    escalation_threshold: 0.10     # 10% delta triggers incident
    check_frequency: "hourly"
    owners: ["@sarah", "@alex"]

  - id: cost_per_successful_translation
    name: "Cost per successful translation"
    source:
      type: sql
      connection: cost_telemetry
      query: |
        SELECT AVG(cost_usd) AS value
        FROM translation_runs
        WHERE status = 'succeeded'
        AND completed_at >= now() - interval '1 hour'
    healthy_band: [0.08, 0.14]
    investigation_threshold: 0.02
    escalation_threshold: 0.05
    check_frequency: "hourly"
    owners: ["@falk"]

  - id: activated_signups
    name: "Activated signups (reached first success)"
    source:
      type: sql
      connection: analytics_warehouse
      query: |
        SELECT COUNT(DISTINCT user_id) AS value
        FROM events
        WHERE event = 'first_success'
        AND timestamp >= current_date - interval '1 day'
    healthy_band: [80, 130]
    investigation_threshold: 15  # absolute count
    escalation_threshold: 30
    check_frequency: "daily"
    owners: ["@sarah"]
```

---

## THE SEVEN COMPONENTS

### 1. Metric collector

A small scheduler that runs every hour. For each KPI in the registry:
- Executes the source query.
- Computes delta vs the 7-day rolling average.
- Stores `{kpi_id, timestamp, value, 7d_avg, delta}` in a time-series table.

Implementation: ~30 lines of Python, or a Cron + a simple SQL loader. The stored time series is your ground truth for every alert that ever fires.

### 2. Change log ingester

Pulls from all the systems that record "something changed":
- Git commit log (deploys).
- Feature flag tool (flag flips).
- Prompt repository history (prompt changes).
- Pricing experiment tracker.
- Vendor status pages / changelogs.

Outputs a unified feed: `{timestamp, system, actor, change_summary}`. The agent correlates KPI movements against this feed.

### 3. Signal ingester

When an alert fires, the agent pulls the last 48 hours of:
- Support tickets (Zendesk, Intercom, Freshdesk).
- Sales call transcripts (Gong, Chorus, Fireflies), filtered by topic.
- NPS and CSAT responses.
- Churn cancellation reasons.
- Product analytics events related to the affected surface.

Clusters by theme using an LLM (same pipeline as Continuous Listening).

### 4. Correlation agent

Takes the quantitative drop, the change log, and the clustered qualitative signal. Produces a ranked list of likely causes.

**Prompt template:**

```
You are a product analyst debugging a KPI movement.

KPI: {kpi_name}
Current value: {current}
7-day average: {avg_7d}
Delta: {delta} ({percent_move})
Healthy band: [{low}, {high}]
Time of first drop: {first_anomaly_timestamp}

Changes in the system in the last 48h before the drop:
{change_log_feed}

Clustered customer signal in the same window:
{signal_clusters}

Produce a ranked list of likely causes. For each cause:

1. Confidence level (0-100%).
2. Supporting evidence (which changes and which signals point to it).
3. Counter-evidence (what would falsify this explanation).
4. Candidate fix (describe the smallest intervention that would test the hypothesis).

Be conservative with confidence. A correlation is not causation. Where
evidence is thin, say so explicitly. Your job is to narrow the search
space for the human, not to decide for them.

Output as structured JSON.
```

### 5. Prototype builder

For the top-ranked likely cause (confidence >= 60%), the agent invokes a prototype builder with:
- The candidate fix description.
- A pointer to the affected source-code surface.
- Your design system (Tailwind config, component library imports).
- A screenshot or flow description of the current state.

**Prototype builder prompt (uses Claude Code or equivalent):**

```
You are generating a clickable prototype to test a product hypothesis.

Current surface: {surface_description}
Candidate fix: {fix_description}
Expected effect: {hypothesized_metric_movement}

Generate a runnable prototype of the proposed fix. Use the design
system at {design_system_path}. Mirror the current surface's data
shape. Deploy to a preview URL. Return the URL only if the prototype
runs cleanly; otherwise return the error log.

Include:
- Minimal instrumentation hooks so an A/B test can be set up in one step.
- A rollback-to-current toggle.
- A one-paragraph "what changed and why" note at the top of the preview.
```

Target: 3-8 minute wall-clock time from trigger to prototype URL.

### 6. Operator interface

Slack message format:

```
🟠 KPI Movement · {kpi_name}

{current} (was {avg_7d} 7d-avg) · Δ {percent_move}

Likely cause (91% confidence):
{top_cause_summary}

Evidence:
• {change_1}
• {signal_cluster_1}
• {signal_cluster_2}

Candidate fix prototype: {prototype_url}

Next step:
[  ] A/B test prototype (50/50 on new sessions today)
[  ] Investigate further (pull more signal)
[  ] Dismiss (expected / noise)

Replying YES/NO/INVESTIGATE actions the choice.
```

### 7. Escalation rules

| Severity | Trigger | Action |
|---|---|---|
| Note | Delta > investigation_threshold | Post to channel, no @-mention |
| Alert | Delta > investigation_threshold AND sustained 2h | Post + @-mention owner |
| Incident | Delta > escalation_threshold OR sustained 48h | Post + PagerDuty + war-room |

Tiers are configurable per KPI. Revenue KPIs usually escalate faster than engagement KPIs.

---

## IMPLEMENTATION ORDER

Week 1: Components 1, 2, 6. Agent posts "KPI moved" alerts. No correlation, no prototype.

Week 2: Add component 3 (signal ingester) and the correlation step (component 4). Agent now posts likely causes.

Week 3: Add component 5 (prototype builder) for one specific alert pattern. Most teams start with "onboarding completion dropped → propose simpler variant of the dropping step."

Week 4+: Expand to additional KPIs and additional prototype types. Each type is a separate prompt template.

---

## WHAT THE AGENT DOES NOT DO

- **Does not ship the fix.** The prototype is a proposal. A human decides A/B test, ship, or discard. Always.
- **Does not celebrate up-moves.** Positive movements don't need the same machinery. Save the ceremony for when the product is breaking.
- **Does not override dismissals.** If you dismiss an alert, the agent records the dismissal and moves on. No re-pinging. The human is in charge.

---

## SUCCESS METRICS FOR THE AGENT ITSELF

- Time from KPI movement to human awareness: target < 1 hour (vs multi-day baseline).
- Time from KPI movement to candidate fix in hand: target < 1 day (vs multi-week baseline).
- Alert precision (percentage of alerts the operator rates as useful): target > 70%.
- Prototype acceptance rate (prototypes that become an A/B test or ship): target > 30%.

Review these quarterly. If alert precision drops below 70%, tighten thresholds. If prototype acceptance drops below 30%, tune the correlation prompt or expand the signal ingester.

---

## RELATED CHAPTERS

- [Your Product KPIs Are Your Job, Daily](/blog/product-kpis-builders-daily) - the human-side ritual this agent automates.
- [Continuous Listening](/os/continuous-listening) - the signal pipeline the agent reads from.
- [The Eval Is The Spec](/os/the-eval-is-the-spec) - how the prototype knows what "good" looks like.
- [Ship With Observability or Don't Ship](/os/ship-with-observability) - the instrumentation contract that feeds this agent's KPI registry.
- [Incident Response Is a PM Ritual](/os/incident-response) - the runbook when the watchdog escalates to incident tier.
