# AI Noise vs Signal Audit

A 10-minute scorecard you can run on any product, post, or PM job description to find out how much of the AI claim is signal and how much is noise.

Companion to: [The AI Noise Tax. Three Patterns Killing Product Credibility.](https://falkster.com/blog/ai-noise-tax)

---

## How to use this audit

Run it three ways depending on what you are auditing.

1. **Audit a SaaS product** (yours or a competitor's). Open the product, the changelog, and the most recent AI-strategy post side by side. Score Section A.
2. **Audit a piece of content** (a LinkedIn post, an Instagram reel, a Substack issue). Score Section B.
3. **Audit a PM job description** (a hire your team is making, or one you are interviewing for). Score Section C.

Each section gives you a score from 0 to 10. Anything under 5 is paying the noise tax. Anything 7 or higher is doing the work.

---

## Section A. SaaS product audit

For each row, score 0 (noise), 1 (partial), or 2 (signal). Maximum 10.

### A1. Changelog cadence

Pull the last six months of release notes.

- 0: Cadence is flat or down. Most releases are bug fixes and UI tweaks. No release shipped a behaviour change that depended on the new architecture.
- 1: Cadence is up modestly. One or two releases shipped a behaviour change that an AI-native rebuild made possible.
- 2: Cadence is measurably faster. Multiple releases per month ship behaviour changes that would have been impossible without the agent layer.

**Score: ___ / 2**

### A2. Primary surface

Open the product. Where does the user spend their time?

- 0: Same dashboard, same forms, same nav as 2019. The agent is a chat sidebar in the corner.
- 1: Some pages have an agent-first surface, but the core workflows are still forms-and-tables.
- 2: The agent is the primary surface. Forms exist as fallback for the cases the agent cannot handle.

**Score: ___ / 2**

### A3. Outputs

Look at the most-used user output.

- 0: User configures a form, clicks "Generate," receives a static report.
- 1: User asks a question, agent responds with summary, user re-asks for next step.
- 2: User states an outcome, agent produces the artifact and the recommended next action in one turn.

**Score: ___ / 2**

### A4. Telemetry and evals

Search the company blog, the docs, the trust page, and the changelog for any reference to evals, agent telemetry, or accuracy reporting.

- 0: No public reporting. AI claims are marketing copy only.
- 1: Some public reporting on accuracy or hallucination rates, but no per-segment or per-workflow breakdown.
- 2: Public eval reports per workflow, per segment, with a delta over time. Failures are reported alongside successes.

**Score: ___ / 2**

### A5. Pricing

Look at the pricing page.

- 0: Per-seat pricing only. AI features are a tier upsell.
- 1: Per-seat with usage-based add-ons for AI features.
- 2: Pricing is decoupled from seats and tied to outcomes (per agent task, per artifact produced, per outcome delivered).

**Score: ___ / 2**

### Total Section A: ___ / 10

| Score | Reading                                                                |
| ----- | ---------------------------------------------------------------------- |
| 0-3   | Pure noise tax. The AI claim is decoration on a 2019-era product.      |
| 4-6   | Partial shift. The story is ahead of the product. Customers will notice. |
| 7-8   | Real shift in progress. The product is the proof. The post is true.    |
| 9-10  | AI-native. Pricing, surface, telemetry, and cadence all match the claim. |

---

## Section B. Content audit

For any post that claims an AI insight, breakthrough, or workflow.

### B1. Working artifact

- 0: Post is a screenshot of model output with a caption. No artifact attached.
- 1: Post links to a prompt or a template, but no measurement of whether it works.
- 2: Post links to a working repo, prompt, or workflow with a measurement of what changed.

### B2. Real distribution

- 0: Single example. Cherry-picked. No edge cases shown.
- 1: A few examples, but no segmentation or failure cases.
- 2: Evaluated against a real distribution of inputs. Failure cases reported alongside successes.

### B3. Honest delta

- 0: "Game-changer" / "10x" / "will replace junior PMs" framing. No baseline.
- 1: Some baseline, but the comparison is loose ("feels faster," "saves time").
- 2: A specific delta on a specific metric. ("Cut weekly synthesis from 3 hours to 25 minutes for the customer-call corpus.")

### B4. Vintage check

How old is the underlying model behaviour?

- 0: The capability has been available for 12+ months, framed as new.
- 1: The capability is recent (3-12 months) but the framing is appropriate.
- 2: The post is about a workflow, not a capability. Vintage does not apply.

### B5. CTA quality

- 0: CTA is "subscribe to my newsletter" or "follow for more."
- 1: CTA is "DM me for the prompt" / lead-magnet flow.
- 2: CTA is the artifact itself, downloadable for free, no email gate.

### Total Section B: ___ / 10

---

## Section C. PM job description audit

For any PM JD that claims to be "AI-native," "AI-first," or "looking for an AI PM."

### C1. Day-one tools

- 0: JD lists no specific AI tools. "Familiarity with AI" is the only mention.
- 1: JD names tools (Claude, Cursor, Copilot) but does not specify what the PM will do with them.
- 2: JD names specific tools and specific workflows ("expected to ship a working prototype within the first 30 days using Claude Code").

### C2. Scope

- 0: PM owns "feature roadmap" and "stakeholder alignment." No mention of agents, evals, or prototypes.
- 1: PM owns features plus an "AI strategy" responsibility, but no concrete artifacts.
- 2: PM owns at least one of: an agent fleet, an eval system, a prototype-driven discovery loop. Concrete artifacts named.

### C3. Cadence expectation

- 0: "Quarterly OKRs" and "annual planning cycles" are the named cadences.
- 1: Mix of quarterly and rolling, no specific commitment.
- 2: Cadence is rolling, not quarterly. Outcome bets, not OKR cascade.

### C4. Engineering ratio

- 0: "Works with engineering to deliver" framing. PM does not build.
- 1: PM is "expected to be technical" but no expectation to ship code.
- 2: PM ships code or working prototypes themselves. Engineering co-builds, does not gatekeep.

### C5. Eval responsibility

- 0: No mention of evals, telemetry, or quality measurement.
- 1: PM is "data-informed" but quality is owned elsewhere.
- 2: PM owns the eval system for at least one workflow or agent. Quality is the PM's job, not just velocity.

### Total Section C: ___ / 10

---

## What to do with your scores

### If you scored low (0-4) on the product or content you audited

You have receipts. Use them. The next time the post about the AI strategy circulates internally, share the scorecard. The next time a vendor pitches you "AI-native," run the audit before the second meeting. The next time your team writes a strategy post, score the post against the product before publishing.

### If you scored medium (5-7)

The shift is partial. Identify the lowest-scoring row and use it as the next investment target. If A4 (telemetry) scored 0, the next quarter's bet is the eval system. If A2 (surface) scored 1, the next quarter's bet is the agent-as-primary-surface rebuild.

### If you scored high (8-10)

You are not paying the noise tax. The work is the proof. Publish the audit publicly, with your real numbers, as a forcing function for the rest of the industry.

---

## Worked example: one real SaaS product

Mid-market CRM, public company, recent "AI-first" essay from the CEO. Audit run May 2026.

| Row                | Score | Notes                                                                                                              |
| ------------------ | :---: | ------------------------------------------------------------------------------------------------------------------ |
| A1. Changelog      |   0   | 18 of 22 releases over 6 months were UI tweaks or bug fixes. One release shipped a chat sidebar. Cadence is flat. |
| A2. Primary surface |   0   | Same forms-and-tables UI as their 2019 deck. Chat sidebar in bottom-right.                                         |
| A3. Outputs        |   0   | User configures filter, clicks Generate, receives a static export. Sidebar adds summary, not new outputs.          |
| A4. Telemetry      |   1   | One blog post on accuracy benchmarks. No per-segment breakdown. No public eval cadence.                            |
| A5. Pricing        |   0   | Per-seat. AI features are a $40/seat upsell.                                                                       |
| **Total**          | **1/10** | **Pure noise tax. The CEO post is ahead of the product by 18 months.**                                           |

Compare against a competitor in the same space, audit run the same week:

| Row                | Score | Notes                                                                                                                 |
| ------------------ | :---: | -------------------------------------------------------------------------------------------------------------------- |
| A1. Changelog      |   2   | 4 releases over 6 months that shipped agent-native workflows. Cadence accelerated.                                    |
| A2. Primary surface |   1   | Two of the four core workflows are agent-first. Other two are still forms.                                            |
| A3. Outputs        |   2   | "What's at risk this week?" returns the report and the recommended action in one turn. No filter UI required.        |
| A4. Telemetry      |   2   | Public eval dashboard on docs.[product].com. Per-workflow accuracy, refreshed monthly. Failures reported.            |
| A5. Pricing        |   2   | Pricing per agent task and per outcome delivered. Seats are free.                                                    |
| **Total**          | **9/10** | **AI-native. The product is the proof. The marketing is true.**                                                     |

Two products in the same category. One score of 1, one score of 9. Customers can see this. So can competitors. So can recruits.

---

## License

Free to use, copy, and modify. Attribution to falkster.com appreciated but not required.

If you run this audit and find it useful, send me what you found. The most interesting failures and wins make their way into future posts (anonymized if you ask). LinkedIn: [linkedin.com/in/fgottlob](https://linkedin.com/in/fgottlob/).
