# Build Agent Stack — Starter Pack

Companion to [Build a Prototype-First Agent Stack](https://falkster.com/blog/build-prototype-agent-stack). Install commands, three ready-to-paste prompts for the three core recipes, and a one-day timeline.

Part 3 of 5 in the [PM Agent Stack series](https://falkster.com/blog/pm-agent-stack-overview).

---

## Install (2 to 3 hours on a clean machine)

```bash
# 1. Base
npm install -g @anthropic-ai/claude-code

# 2. Superpowers (the brainstorm-spec-plan-TDD-review workflow)
/plugin marketplace add obra/superpowers-marketplace

# 3. repomix (codebase packing)
npm install -g repomix

# 4. tdd-guard (TDD enforcement)
#    Install from github.com/nizos/tdd-guard following the README

# 5. ccundo (granular undo) — DAY ONE INSTALL
#    Install from github.com/RonitSachdev/ccundo

# 6. design-review-workflow
#    Install from github.com/patrick-ellis/design-review-workflow

# 7. claude-code-security-review (official Anthropic GitHub Action)
#    Add to .github/workflows/ in any repo per anthropics/claude-code-security-review

# 8. claude-context (skip if greenfield prototypes)
#    Install from github.com/zilliztech/claude-context for codebases >100k lines
```

Do not install all eight at once. The smallest useful version is Claude Code + Superpowers + repomix.

---

## Recipe 1 — PRD to working prototype in a day

The recipe runs across about 8 working hours.

### Hour 0 — Brainstorm

```
/brainstorm

I want to test the following assumption: {one paragraph stating the assumption}.

The simplest prototype that would produce a useful signal is one that
{two-sentence description of the minimum viable artifact}.

Brainstorm three approaches with tradeoffs (build time, fidelity required,
believability to engineering, and what each approach would NOT prove).
```

### Hour 1 — Spec

```
/spec

Approach chosen: {one of the three from brainstorm}

Write a spec for the prototype:
- One-page document
- User flows (numbered, sequential)
- Data shape (what gets persisted, what is in-memory)
- Test cases that would prove the assumption
- Explicit out-of-scope list (what we are NOT building)

Use the format from src/prototype/SPEC.md if it exists, otherwise plain markdown.
```

### Hour 2 — Plan

```
/plan

Spec attached above. Break it into TDD-ordered tasks: each task names the
test case to write first, the implementation to make it pass, and the
refactor opportunity afterward. Stop when all spec test cases are covered.
```

### Hours 3–6 — Execute the plan

The agent runs the plan. tdd-guard is now active. Watch the output, redirect when off-track, READ EACH DIFF before approval.

### Hour 7 — Review

```
Run the design review workflow on the prototype directory.
Run the security review GitHub Action against the prototype branch.
Report all findings ranked by severity (critical, high, medium, low).
Fix the critical and high findings before the demo.
```

### Hour 8 — Demo

End-of-day artifact: a working prototype, a spec it conforms to, a passing test suite, a design review report, and a security review report. Engineering's first reaction is "okay, this is real" instead of "let me think about it."

---

## Recipe 2 — Code review pass on someone else's prototype (or your own)

```bash
# In the prototype directory:
repomix --output context.xml
```

Then in Claude Code:

```
Read context.xml.

Run a code review through the code-reviewer subagent.
Run a UX critique through the UX-reviewer subagent.
Run the design review workflow.
Run the security review action.

Report all findings ranked by severity.

For each high-severity finding, propose a specific fix with the diff.
```

Output is a five-page review. Fix the high-severity items, acknowledge the medium ones in the README, ignore the rest. Hand it to engineering having already done the obvious passes.

---

## Recipe 3 — The spec-as-eval workflow

Premise: for AI features, the spec IS the eval suite. The agent's job is to pass the test cases.

```
For the feature {feature name}, write the eval suite:

- 20 test cases, each in the form: input → expected behavior → why this matters
- Cover happy path, common edge cases, hostile inputs, and disconfirming behaviors
- Each test case must be independently executable (no test relies on another)

Then write a reference implementation that passes all 20 cases.

Hand engineering BOTH: the eval suite (the spec) and the reference
implementation (proof the spec is achievable). Engineering rewrites the
implementation; the eval stays put.
```

---

## What stays human

- Decide what to prototype.
- Decide which assumption is worth testing.
- Read the room when the demo lands flat.
- Take responsibility for the call to keep, kill, or extend.
- READ THE DIFFS. PMs who refuse to look at the code get into trouble.

---

## Critical limitations

- This stack does NOT ship to production. Prototypes are for producing sharper conversations, not for end-user deployment.
- Most policies cover production code, not prototypes on a personal branch in a sandbox repo. Read your company's policy first.
- The agent will get stuck or off-track at least twice in a day. Plan for 30 minutes of redirection per session.
- Without TDD enforcement, the agent will skip tests under load. Keep tdd-guard installed even when it slows you down.

**Sources:** [Build agent stack post](https://falkster.com/blog/build-prototype-agent-stack) · [Instant prototyping](https://falkster.com/os/instant-prototyping) · [The eval is the spec](https://falkster.com/os/the-eval-is-the-spec) · [Assumption testing](https://falkster.com/os/assumption-testing)
