Product Builder OSHandbook§02 — DiscoverChapter 4 / 5

The Assumption Testing Playbook

Stop debating what to build. Prototype it in hours, put it in front of customers, and let their reaction be the test. Assumption testing just went from weeks to days.

Falk Gottlob2026-03-08Updated 2026-04-2413 min readUpdated

Every decision is still a bet

You want to build a feature. Before you write a single line of production code, you're betting on a stack of assumptions. That hasn't changed. What's changed is how fast you can test them.

Say you want to build an AI-powered job recommendation system for freelancers. Here are the assumptions you're betting on:

Desirability: Freelancers want personalized recommendations. They trust algorithmic curation. They'd rather see fewer, targeted options than browse everything.

Viability: You can build this in 8 weeks. The recommendation engine will be accurate enough to be useful. It moves the activation metric enough to justify the cost.

Feasibility: Your team has ML expertise. Your infrastructure supports it. You can integrate it without a major redesign.

Usability: Freelancers understand "recommended for you." They know why a job was recommended. They use recommendations in their workflow.

Any one of those assumptions being wrong kills the project. The question isn't whether to test them. It's how fast.

The old testing model vs. the new one

The old model: Map your assumptions. Design an experiment for each one. Run experiments over 2-6 weeks. Analyze results. Decide.

Total time: 4-8 weeks of testing before you've built anything.

That was the right approach when experiments were expensive to set up. When building a prototype took weeks. When getting customer feedback required scheduling calls and waiting for availability.

The new model: Map your assumptions. Build a prototype that tests the most critical ones simultaneously. Put it in front of customers this week. Their reaction is the experiment.

Total time: 3-5 days.

The prototype is the test. That's the fundamental shift. Instead of designing separate experiments for desirability, usability, and viability, you build a working thing that tests all three at once.

When a freelancer uses your recommendation prototype and says "I wouldn't use this, I prefer browsing everything," you just tested desirability, and it failed. When they use it and say "these recommendations don't match my skills at all," you just tested feasibility (can you build an accurate engine), and it failed. When they use it and immediately apply to three jobs, you just tested desirability and usability, and they passed.

One prototype. One week. Multiple assumptions tested.

The assumption map (still do this)

Even with faster testing, you still need to know what you're testing. Skipping the assumption map and jumping straight to prototyping is a common mistake. You build something, show it to customers, they say "cool," and you've validated nothing specific.

Spend 20 minutes with your team mapping assumptions. Here's the exercise:

Step 1: Write down every assumption (10 minutes). Don't filter. Get them all out.

For the recommendation system:

Freelancers want personalized recommendations
They prefer recommendations to search
They trust algorithmic curation
They'll check recommendations regularly
We can build an accurate recommendation engine
It doesn't require new infrastructure
The UI will be intuitive
Recommendations will increase apply rate
Apply rate increase will improve activation

Step 2: Identify the killers (10 minutes). For each assumption, ask: if this is false, does the whole project fail?

Red (kills the project): "Freelancers want personalized recommendations" and "Recommendations increase apply rate." If freelancers don't want them or if recommendations don't lead to more applications, everything else is irrelevant.

Yellow (important but solvable): "We can build an accurate engine" and "The UI is intuitive." These might slow you down but won't kill the concept.

Green (nice to have): "They'll check recommendations daily" and "It doesn't need new infrastructure." Details you'll figure out.

Step 3: Design the prototype to test the red assumptions.

This is the key. Your prototype isn't a demo of the solution. It's a test of the riskiest assumptions. Design it to answer: do they want this, and does it lead to the behavior we need?

The prototype as experiment

Here's how I design prototypes that test assumptions, not just showcase solutions.

For testing desirability ("do they want this?"):

Build the simplest version of the solution that a customer can react to honestly. For recommendations: show 10 recommended jobs based on the freelancer's profile. Make them clickable. Watch what happens.

If they browse the recommendations and apply to 2-3, desirability is validated. If they glance at the recommendations and go back to search, desirability failed. You didn't need a survey. You didn't need a focus group. You watched them choose between your solution and their current behavior.

For testing usability ("do they understand it?"):

Don't explain the prototype. Just share it. "Here, take a look at this. What do you think it does?" If they understand it in 10 seconds, usability passes. If they stare at it confused, it fails. The prototype reveals usability problems instantly because confused people look confused.

For testing feasibility ("can we build it well enough?"):

Use real data in the prototype. Don't use dummy jobs. Pull actual listings and run them through a simple matching algorithm. If the matches are obviously wrong ("you're a Python developer, here are 10 graphic design jobs"), the feasibility assumption is shaky. If the matches are reasonable, you have signal that the technical approach works.

For testing viability ("does it move the metric?"):

This one's harder to test with a prototype alone. But you can get a proxy. After the customer uses the prototype, ask: "If this existed in the product, would it change how often you apply?" or "Would this have kept you from leaving?" Their answer isn't proof, but combined with the behavioral signal from how they used the prototype, it's directional.

Running the one-week test cycle

Here's the actual schedule.

Monday: Map and build.

Morning: 20-minute assumption mapping with your team. Identify the red assumptions.

Afternoon: Build the prototype. Focus on testing the riskiest assumption. Use AI coding tools to get something functional in 2-3 hours. It doesn't need to be polished. It needs to work well enough for a customer to react honestly.

Tuesday-Wednesday: Test with customers.

Show the prototype to 5 customers. Use the prototype interview format. Each conversation is 30 minutes. Watch them use it. Ask the follow-up questions. Note which assumptions their behavior confirms or challenges.

If you can't get 5 live calls, do 3 live and 2 async. Send the prototype link with a Loom walkthrough and ask for their reaction via email or Slack.

Thursday: Synthesize and iterate.

Review your notes and the AI-generated synthesis from the interview transcripts. For each red assumption, what did you learn?

If the assumption passed (customers engaged, understood it, connected it to their problem), move forward.

If the assumption failed (customers didn't engage, were confused, or went back to their current behavior), you have a decision to make.

If the results are ambiguous, iterate on the prototype and test with 2-3 more customers on Friday.

Friday: Decide.

Three possible outcomes:

Green: Build it. The critical assumptions held up. Customer reactions were strong. Move to production. Use the prototype as the spec.

Yellow: Iterate. The concept is right but the execution missed. Customers engaged but had specific feedback ("I'd use this if it had X"). Iterate on the prototype next week, test again.

Red: Kill it or pivot. The critical assumption failed. Customers don't want this, or the underlying approach doesn't work. Kill the solution and go back to the opportunity. Is there a different solution worth prototyping?

When prototypes aren't enough

Prototypes test desirability and usability well. They're weaker at testing some types of assumptions. Here's when you need a different approach.

Testing retention assumptions ("will they keep using it?"): A prototype test tells you if someone is interested in the moment. It doesn't tell you if they'll use it next week. For retention assumptions, you need the "leave it with them" test. Give 10-20 customers access to the prototype for a week. Check back. If they used it without being prompted, the retention assumption has signal. If they forgot about it, it doesn't matter how excited they were in the demo.

Testing scale assumptions ("does this work at volume?"): Your prototype might work beautifully for 10 users. But if it depends on manual curation, personalized content, or high-touch support, it won't scale. For scale assumptions, run a concierge test first (manually do the thing for 50 users), measure the impact, then ask: can we automate what the concierge did?

Testing pricing assumptions ("will they pay for this?"): Prototypes test value, not willingness to pay. For pricing, use a smoke test. Put a "premium" badge on the prototype with a price. "This feature is available on our Pro plan for $X/month." Measure how many people click through versus bounce. It's not perfect, but it's directional.

Testing technical assumptions ("can we build this at production quality?"): Your prototype is a hack. Can your team actually build the production version in the estimated time? For technical assumptions, have your engineer spend a day on a spike. Not building the feature, just building the hardest part. If the spike works, the assumption holds. If it surfaces unexpected complexity, adjust your timeline or approach.

The kill decision (the hardest part)

Everything I've described makes testing fast. But the hard part was never testing. It was acting on the results.

When a test fails, most PMs rationalize. "The sample was too small." "The prototype wasn't polished enough." "We just need to iterate more." "Let's build the real version and see."

Don't do this. If 5 customers don't engage with your prototype, adding polish won't fix it. The concept didn't land. The assumption was wrong.

I killed a recommendation project at Smartcat based on prototype testing. We showed personalized job recommendations to 8 new freelancers. Two engaged. Six went back to search. When I asked why, the pattern was clear: they wanted to see all their options, not a curated set. They valued visibility over optimization.

We could have rationalized. We could have said the algorithm needed tuning. We could have built the full version hoping adoption would be different at scale.

Instead, we killed it. Pivoted in a week. Built better search filters and a "browse all projects in your category" view. Shipped in three weeks. Activation improved 28%.

The kill saved us 8 weeks of building the wrong thing. The pivot took 3 weeks to build and ship. Total time from "bad idea" to "working solution that moved the metric": 4 weeks. In the old model, we'd have spent 8 weeks building recommendations, shipped them, measured 5% adoption, spent 4 more weeks iterating, and eventually killed it anyway. That's 12 weeks wasted instead of 1.

The kill decision is the highest-leverage decision you make. Testing just makes it cheaper and faster to get there.

AI-powered assumption tracking

One thing AI does better than any human: keeping track of what you've tested and what you haven't.

Set up an assumption tracking agent. It maintains a living document of every assumption across every active project. For each assumption, it tracks: current status (untested, testing, validated, invalidated), the test method used, the result, and the date.

Every Friday, the agent generates a report: "You have 12 active assumptions across 3 projects. 7 are validated. 2 are invalidated. 3 are untested. The untested assumptions are: [list]. Recommended: test assumption X next week because it's a red-risk assumption on your highest-priority project."

This sounds simple but it solves a real problem. In practice, teams test the easy assumptions and avoid the scary ones. The ones that might kill the project. The agent doesn't have that bias. It flags the untested red assumptions every week until you test them.

It also connects to your signal data. "Assumption: Enterprise customers want bulk export. Signal update: 15 more support tickets about export this week, 60% from enterprise accounts. This assumption is getting stronger based on signal data alone, but has not been prototype-tested yet."

The assumption tracker turns testing from an ad-hoc practice into a system. Every assumption gets tracked. Nothing falls through the cracks.

For the full agent setup, see Your AI Agent Fleet.

Building a testing culture (faster now)

In the old model, building a testing culture took months. You had to convince your team that spending 4-6 weeks on experiments before building was worth the delay. That was a hard sell when leadership was pushing for shipped features.

The new model makes this sell easier because the "delay" is measured in days, not weeks.

Week 1: Pick one project. Map assumptions. Build a prototype. Test it with 5 customers. Show your team the results. Total investment: one week.

If the test validated the approach, you saved your team from building blindly. If the test killed the approach, you saved 8 weeks of wasted engineering. Either way, the ROI is obvious.

Week 2: Do it again with the next project. This time, invite your engineer and designer to the customer prototype sessions. Let them see the reactions firsthand. When an engineer watches a customer struggle with the prototype and say "I'd need it to do X instead," the engineer is now invested in building the right thing.

Week 3: Your team starts asking for it. "Did we test this assumption?" "What did customers say about the prototype?" "Can we show this to a few more people before we commit?" That's the culture shift. It happened in three weeks instead of three months because the cost of testing dropped to near zero.

Month 2: Testing is the default. No one starts building a production feature without showing a prototype to customers first. Not because you mandated it. Because they saw the difference between building confident and building blind.

Your test this week

You have a project in mind. Stop reading and do this.

Right now (10 minutes): Write down the top three assumptions your project sits on. Which one, if wrong, kills the whole thing?

Today (2-3 hours): Build a prototype that tests the riskiest assumption. Not a mockup. A working thing. Use AI coding tools. Describe what you want and iterate until it's functional enough for a customer to react to.

Tomorrow (2 hours): Show it to 3 customers. Use the prototype interview format. Watch them use it. Listen to their reaction.

End of week (30 minutes): Decide. Did the assumption hold? Build, iterate, or kill?

You just compressed a month of testing into a week. The prototype did the work of multiple experiments. The customer reactions gave you real data, not hypothetical answers.

The companies that win are the ones that get to the right decision fastest. A prototype in front of a customer tests more assumptions in 30 minutes than a month of surveys and data analysis. Build it, show it, learn, decide.

Start today.

Share this post

LinkedIn X BlueskyEmail

Frequently asked

What is the difference between red, yellow, and green assumptions?+

Red assumptions kill the project if they're wrong. Yellow assumptions slow it down but can be worked around. Green assumptions are nice-to-have. You identify which is which in a 20-minute mapping exercise. Then you design a prototype that tests the red ones first.

How does a prototype test multiple assumptions simultaneously?+

When a customer uses your prototype, their behavior answers several assumption questions at once. If they browse recommendations and apply, you tested desirability and usability. If the matches are obviously wrong, you tested feasibility. If they say they'd use this, you got a viability signal. One prototype, multiple assumptions tested.

What is the leave it with them test and when do you use it?+

A prototype-in-the-moment tells you if someone wants something. Giving them access for a week tells you if they actually use it when you're not in the room. For retention assumptions, give 10 to 20 customers a week of access. If they use it without prompting, the assumption has signal. If they forget about it, it doesn't matter how excited they were.

How do you test pricing assumptions with a prototype?+

Put a premium badge on the prototype with a price. Measure how many people click through versus bounce. It's not perfect, but it's directional. The people who saw premium and kept going are telling you something about willingness to pay.

What does it mean if the assumption failed after a one-week test?+

It means this entire direction might be wrong. You have a choice: kill the feature (learning cost near zero), pivot to test a different solution for the same opportunity (hypothesis still valid but solution was wrong), or iterate on the prototype and test again (execution was wrong but concept has legs). The data tells you which.

The Assumption Testing Playbook

Every decision is still a bet

The old testing model vs. the new one

The assumption map (still do this)

The prototype as experiment

Running the one-week test cycle

When prototypes aren't enough

The kill decision (the hardest part)

AI-powered assumption tracking

Building a testing culture (faster now)

Your test this week

Frequently asked

Related reading

Build Your First Opportunity Solution Tree

The Interview Guide That Actually Works

Why Users Don't Know What They Want Until You Show Them

Show, Don't Tell: Why Users Can't Describe What They Need

How a 2-Hour Prototype Killed a 3-Month Project

Continuous Discovery

Audit, workshop, or advisory.

Follow on LinkedIn.

Browse the toolkit.