Gross Margin Is Your Job Now
Cost per successful action is the new primary PM metric. If you don't own it, your CFO will kill your product before your customers do.
The metric nobody taught us to care about
For thirty years of my career, marginal cost on a software feature was basically zero. My job was demand-side: engagement, retention, conversion, NPS. The CFO worried about the supply side. We stayed in our lanes.
That deal is over. In 2026, every AI feature I ship has a cost line that moves with usage. Token costs. Retrieval costs. Model routing. Vendor markup. Latency-driven retries. These variables move gross margin by 20 to 40 percentage points on a product. That's the difference between a venture-backed business and a venture-killed one.
And if the PM doesn't own cost, nobody does. The CFO sees a number at month end. The engineer sees a prompt at the character level. Only the PM sees the full customer job end-to-end. Only the PM can say "this entire flow costs too much and needs to be rebuilt."
So gross margin is my job now. Welcome to the role.
The numbers that should ruin your afternoon
If any of these surprise you, we have work to do.
- AI-first SaaS gross margins are running 55 to 70 percent in 2026. Traditional SaaS runs 78 to 85 percent. That's 20-plus points of margin I have to earn back through product decisions.
- A B2B app processing 50M tokens per enterprise customer per month spends $500 to $2,000 in raw inference, before retrieval, before tools, before any other COGS.
- Companies still pricing per-seat in AI are running 40 points lower gross margin than companies that moved to hybrid usage pricing. Same product, different pricing, different business.
- Anthropic Sonnet 4.5 doubles in price above the 200K token input threshold. The PM who doesn't know that threshold ships a feature that silently halves margin the first time a real customer tries it with a large document.
These are not finance problems. They are product decisions showing up on the finance ledger. The person making them should be you.
The three levers every PM can pull
You don't need to become an ML engineer. You need to make three decisions well.
Lever 1: Model routing.
Not every task needs your flagship model. Classification, structured extraction, simple rewrites, and format conversion often work on a 10x cheaper model with comparable quality. My job is to know which surfaces in my product can be down-routed without eval scores dropping.
The artifact: a routing table. For each surface: what model, what fallback, what eval threshold triggers a swap. I review this quarterly. About once a quarter I find a surface that was over-routed to the flagship for no reason other than we shipped it that way.
Lever 2: Prompt and context hygiene.
Most production prompts in 2026 are three to ten times longer than they need to be. PMs over-specify because they don't trust the model. Engineers inherit the prompt and don't prune it. Context windows get packed with "just in case" instructions that you pay for per token.
Cutting a prompt from 3,000 tokens to 600 (which is usually possible without quality loss) cuts cost on that surface by about 80 percent. I don't need an engineer for this. I open the prompt, cut ruthlessly, run the eval, keep the cuts that didn't drop the score. Most of my cuts survive.
Lever 3: Caching and early exit.
Prompt caching on Anthropic, OpenAI, and Gemini cuts cost 50 to 90 percent on repeated prefixes (system prompts, tool definitions, long instructions). Most teams aren't using it. Early-exit logic (don't call the LLM if a rule or a cached answer will do) cuts cost 100 percent on the hit. My job is to identify the top 20 percent of inputs by volume and design for cache hits or rule shortcuts on them.
The dashboard I actually watch
Every AI product I own has a dashboard with these metrics, visible at all times:
- Cost per successful action (by surface).
- 7-day trend of that number.
- Latency p50 / p95 / p99 (by surface).
- Token volume (input vs output, by surface).
- Cache hit rate.
- Model distribution (how much traffic hits which model).
- Failed actions and their cost (yes, failures still cost money).
If this isn't on your wall, you're guessing. I had the ugly version of this as a Notion page for six months before anyone built a real dashboard. The ugly version was still better than not having it.
The pricing interaction
Cost ownership is pointless if pricing doesn't flex with it. This is why Pricing for AI Products is the natural next chapter. Cost and pricing are two sides of the same decision: is the customer paying enough for the specific flow they just ran?
In a world where a single "answer this from my docs" query can range from 5 cents to 5 dollars depending on document size, per-seat pricing is an arbitrage your customers will win every time. Your pricing has to map to value units. You can't design that without knowing the cost units first.
What the old playbook got wrong
The 2018 playbook was: ship the best product, optimize cost later, scale compresses unit costs. That worked when unit cost was deterministic, had steep scale curves, and was fractions of a cent per action.
None of that holds now. Unit cost is stochastic (depends on prompt length, tool calls, retries). Scale curves are flat (you pay per token whether you have 10 users or 10 million). Cost per action is 1 to 20 cents.
"Optimize later" means "rewrite the product later." That's not a runway you have.
Pick one thing this week
Here's a 90-minute exercise that will embarrass you if you've never done it before.
- Pick the most-used AI surface in your product.
- Find the cost per successful action over the last 30 days.
- If you don't have that number, stop and figure out why you don't have it. That's your week.
- If you do have it, find the top 10 percent of requests by cost. What do they have in common? (Long context? Specific customer? A loop?)
- Apply one of the three levers to that slice: route to a cheaper model, cut the prompt, add a cache.
- Ship the change. Measure again in 7 days.
My first time doing this on a surface at Smartcat, I found a prompt that was 4,200 tokens because three different PMs had added instructions over six months. I cut it to 800. Same eval score. The feature got 70 percent cheaper overnight.
In an AI product, every feature has a cost line. If you don't know which of yours is bleeding out, one of them is, and your board will find it before you do.
Frequently asked
Why is gross margin a PM job, not a CFO job?+
Because the PM is the only person seeing the entire user job end-to-end. The CFO sees a number at month end. The engineer sees a prompt. Only you can say: this entire flow costs too much and needs rebuilding. Cost is a product decision.
What gross margin should I be hitting on AI products?+
55 to 70 percent in 2026. Traditional SaaS runs 78 to 85. You're 20-plus points behind, and product decisions are how you earn it back: model routing, prompt hygiene, caching, early-exit logic.
How do I route to cheaper models without harming quality?+
Build a routing table for each surface: what model, what fallback, what eval threshold. Many surfaces work on 10x cheaper models with the same eval scores. Review quarterly. About once a quarter you'll find something over-routed for no reason other than it shipped that way.
What's the single biggest cost mistake most teams make?+
Over-specified prompts. Most production prompts are 3-10x longer than needed. Cut a prompt from 3,000 tokens to 600 without quality loss and the feature is 80 percent cheaper. You don't need an engineer for this. You do the cuts, run the eval, keep what survives.
What should my cost dashboard show?+
Cost per successful action (by surface), 7-day trend, latency (p50, p95, p99), token volume (input vs output), cache hit rate, model distribution, and failed-action costs. Guess if you don't have this. I had a Notion page for six months. Ugly version was better than not having it.
Related reading
Deeper essays and other handbook chapters on the same thread.
When Not to Use AI
The senior PM move in 2026 isn't using AI everywhere. It's knowing when a regex, a query, or a form beats a model.
Prompt Ops
Your prompts are production code. Version, review, eval, stage, and roll back, or your product is one Notion edit away from breaking.
The Living Changelog
Your model vendor changed the model on Tuesday and didn't tell you. Run a daily replay against production or your customers will catch it before you do.
Trust, Safety, and the Guardrail as a Product Decision
Every guardrail is a product decision. The PM who outsources it to legal gets a product they didn't design and a customer experience they wouldn't approve.
The Eval Is The Spec
Kill the PRD. Ship against a test set. The eval is the contract, the changelog, and the definition of done.
Pricing for AI Products
Per-seat is dead for AI. Price the work the seat is no longer doing: outcomes, usage, value units.