Gross Margin Is Your Job Now

The metric nobody taught us to care about

For thirty years of my career, marginal cost on a software feature was basically zero. My job was demand-side: engagement, retention, conversion, NPS. The CFO worried about the supply side. We stayed in our lanes.

That deal is over. In 2026, every AI feature I ship has a cost line that moves with usage. Token costs. Retrieval costs. Model routing. Vendor markup. Latency-driven retries. These variables move gross margin by 20 to 40 percentage points on a product. That's the difference between a venture-backed business and a venture-killed one.

And if the PM doesn't own cost, nobody does. The CFO sees a number at month end. The engineer sees a prompt at the character level. Only the PM sees the full customer job end-to-end. Only the PM can say "this entire flow costs too much and needs to be rebuilt."

So gross margin is my job now. Welcome to the role.

The numbers that should ruin your afternoon

If any of these surprise you, we have work to do.

AI-first SaaS gross margins are running 55 to 70 percent in 2026. Traditional SaaS runs 78 to 85 percent. That's 20-plus points of margin I have to earn back through product decisions.
A B2B app processing 50M tokens per enterprise customer per month spends $500 to $2,000 in raw inference, before retrieval, before tools, before any other COGS.
Companies still pricing per-seat in AI are running 40 points lower gross margin than companies that moved to hybrid usage pricing. Same product, different pricing, different business.
Anthropic Sonnet 4.5 doubles in price above the 200K token input threshold. The PM who doesn't know that threshold ships a feature that silently halves margin the first time a real customer tries it with a large document.

These are not finance problems. They are product decisions showing up on the finance ledger. The person making them should be you.

The three levers every PM can pull

You don't need to become an ML engineer. You need to make three decisions well.

Lever 1: Model routing.

Not every task needs your flagship model. Classification, structured extraction, simple rewrites, and format conversion often work on a 10x cheaper model with comparable quality. My job is to know which surfaces in my product can be down-routed without eval scores dropping.

The artifact: a routing table. For each surface: what model, what fallback, what eval threshold triggers a swap. I review this quarterly. About once a quarter I find a surface that was over-routed to the flagship for no reason other than we shipped it that way.

Lever 2: Prompt and context hygiene.

Most production prompts in 2026 are three to ten times longer than they need to be. PMs over-specify because they don't trust the model. Engineers inherit the prompt and don't prune it. Context windows get packed with "just in case" instructions that you pay for per token.

Cutting a prompt from 3,000 tokens to 600 (which is usually possible without quality loss) cuts cost on that surface by about 80 percent. I don't need an engineer for this. I open the prompt, cut ruthlessly, run the eval, keep the cuts that didn't drop the score. Most of my cuts survive.

Lever 3: Caching and early exit.

Prompt caching on Anthropic, OpenAI, and Gemini cuts cost 50 to 90 percent on repeated prefixes (system prompts, tool definitions, long instructions). Most teams aren't using it. Early-exit logic (don't call the LLM if a rule or a cached answer will do) cuts cost 100 percent on the hit. My job is to identify the top 20 percent of inputs by volume and design for cache hits or rule shortcuts on them.

The dashboard I actually watch

Every AI product I own has a dashboard with these metrics, visible at all times:

Cost per successful action (by surface).
7-day trend of that number.
Latency p50 / p95 / p99 (by surface).
Token volume (input vs output, by surface).
Cache hit rate.
Model distribution (how much traffic hits which model).
Failed actions and their cost (yes, failures still cost money).

If this isn't on your wall, you're guessing. I had the ugly version of this as a Notion page for six months before anyone built a real dashboard. The ugly version was still better than not having it.

The pricing interaction

Cost ownership is pointless if pricing doesn't flex with it. This is why Pricing for AI Products is the natural next chapter. Cost and pricing are two sides of the same decision: is the customer paying enough for the specific flow they just ran?

In a world where a single "answer this from my docs" query can range from 5 cents to 5 dollars depending on document size, per-seat pricing is an arbitrage your customers will win every time. Your pricing has to map to value units. You can't design that without knowing the cost units first.

What the old playbook got wrong

The 2018 playbook was: ship the best product, optimize cost later, scale compresses unit costs. That worked when unit cost was deterministic, had steep scale curves, and was fractions of a cent per action.

None of that holds now. Unit cost is stochastic (depends on prompt length, tool calls, retries). Scale curves are flat (you pay per token whether you have 10 users or 10 million). Cost per action is 1 to 20 cents.

"Optimize later" means "rewrite the product later." That's not a runway you have.

Pick one thing this week

Here's a 90-minute exercise that will embarrass you if you've never done it before.

Pick the most-used AI surface in your product.
Find the cost per successful action over the last 30 days.
If you don't have that number, stop and figure out why you don't have it. That's your week.
If you do have it, find the top 10 percent of requests by cost. What do they have in common? (Long context? Specific customer? A loop?)
Apply one of the three levers to that slice: route to a cheaper model, cut the prompt, add a cache.
Ship the change. Measure again in 7 days.

My first time doing this on a surface at Smartcat, I found a prompt that was 4,200 tokens because three different PMs had added instructions over six months. I cut it to 800. Same eval score. The feature got 70 percent cheaper overnight.

In an AI product, every feature has a cost line. If you don't know which of yours is bleeding out, one of them is, and your board will find it before you do.

Gross Margin Is Your Job Now

The metric nobody taught us to care about

The numbers that should ruin your afternoon

The three levers every PM can pull

The dashboard I actually watch

The pricing interaction

What the old playbook got wrong

Pick one thing this week

Frequently asked

Related reading

When Not to Use AI

Prompt Ops

The Living Changelog

Trust, Safety, and the Guardrail as a Product Decision

The Eval Is The Spec

Pricing for AI Products

Audit, workshop, or advisory.

Follow on LinkedIn.

Browse the toolkit.