How Many Customer Interviews Before You Trust the Pattern?

Five users. That is the number a designer or PM will quote in almost any meeting where someone asks how much research is enough. It comes from Jakob Nielsen, it has a real mathematical model behind it, and it is one of the most confidently misapplied numbers in product work. The problem is not that the number is wrong. The problem is that most teams quote it as the answer to a question Nielsen never asked.

The five-user rule was built to answer one narrow thing: how many people you need to watch use an interface before you have caught most of the usability problems in it. It says almost nothing about how many customers you need to interview before you can trust a pattern about their problems, their motivations, or whether you should build the thing at all. Those are two different research questions, and they have two different answers. Conflating them is how a team ends up canceling a roadmap bet on the strength of four conversations.

What the five-user number actually measures

In 1993, Jakob Nielsen and Tom Landauer published the model that the rule rests on. The number of usability problems found in a test follows the curve N × (1 − L)^n, where N is the total number of problems in the design, L is the proportion of problems an average single user uncovers, and n is the number of users you test. Plug in the value they observed across real projects, an L of about 31%, and the math says five users will surface roughly 85% of the usability problems in an interface. Nielsen’s own summary of the curve is blunt: after the fifth user, “you are wasting your time by observing the same findings repeatedly but not learning much new.” His full explanation of the calculation is worth reading before you cite the number to anyone.

Notice what that model assumes. It assumes you are watching people attempt tasks in a built interface, and that a usability problem is a concrete, observable event: someone clicks the wrong thing, misses a button, gets stuck on a step. Those problems are dense and they repeat fast, which is why the curve flattens so quickly. Nielsen also attaches a caveat that gets dropped every time the number gets quoted: five works for a single, uniform user group. The moment you have genuinely distinct groups, say buyers and approvers, or new users and power users, you need a few users from each, because the problems one group hits are not the problems the other group hits. NN/G has since published a separate piece on why five is fine for qualitative testing but useless for quantitative claims. Five users can tell you a problem exists. Five users cannot tell you what percentage of your base hits it.

Discovery interviews are a different curve

Problem discovery is not usability testing. You are not watching someone fail at a task you designed. You are trying to understand a person’s situation, their workarounds, the trigger that made them go looking for a solution, the language they use. That is a much richer and messier thing to map, and the research on how many interviews it takes tells a different story.

The most cited study here is Guest, Bunce and Johnson’s 2006 analysis in Field Methods. They took 60 in-depth interviews from a tightly scoped study and tracked when new codes, the distinct themes in the data, stopped appearing. Within the first six interviews they had already captured about 80% of the codes that showed up across all 60. Code saturation, the point where new interviews stopped surfacing new themes, arrived by around the twelfth. That is where the popular shorthand “you need about a dozen interviews” comes from. A readable breakdown of the study lays out the numbers and, more importantly, the conditions.

Those conditions are the whole game. Guest’s twelve held because the study had a narrow research question, a relatively homogeneous group of participants, and trained interviewers who knew the context cold. Strip any of those away and the number climbs. A vague question, a mixed audience, an interviewer still learning the domain: each one stretches how many conversations it takes before the themes stabilize. Twelve is not a law. It is what saturation looked like under near-ideal conditions.

The number you actually care about is the second one

Here is the distinction that changes how you run discovery. In 2017, Monique Hennink and colleagues re-analyzed a set of 25 interviews and split saturation into two kinds. Code saturation, hearing the full range of topics at least once, arrived early, by about the ninth interview, where they had captured 91% of their codes. But meaning saturation, understanding each topic richly enough to act on it, took far longer: somewhere between 16 and 24 interviews, and a few of the most conceptual themes never fully saturated even at 25. Their study on code versus meaning saturation draws the line precisely: code saturation tells you when you have “heard it all,” meaning saturation is when you “understand it all.”

That gap is exactly where product teams get burned. After nine conversations you have heard every theme, and it feels like you are done. You can list the problems. What you do not yet have is the texture: which version of the problem is the painful one, what conditions make it acute, why the obvious solution has not already solved it. The concrete themes, the ones you can almost see, saturate fast. The conceptual ones, motivation, trust, perceived risk, the things that actually decide whether someone adopts your product, take roughly twice as many interviews to understand. If your roadmap bet hinges on a conceptual claim about why customers behave a certain way, nine interviews is not enough, no matter how confident the room feels.

Why a fixed quota is the wrong tool anyway

The honest answer to “how many interviews” is that you should not be counting to a target at all. Every credible version of this research lands in the same place: analyze as you go and stop when you stop learning. A fixed quota encourages two failures. Teams that set the bar low, four interviews and a strong opinion, declare a pattern from noise. Teams that set it high run twenty interviews on autopilot, having learned nothing new after the tenth, because the number on the plan said twenty.

I watched the first failure up close on a fractional COO engagement. A capable product team had run four customer calls, heard the same complaint in two of them, and walked into a planning meeting ready to reprioritize a quarter of the roadmap around it. Two out of four is a coin flip wearing a lab coat. We did not need a bigger study. We needed about eight more conversations, run and reviewed in batches, to find out whether that complaint was the real problem or just the loudest one in a small sample. It was not. By interview ten the actual pattern was something adjacent and more tractable, and the original “insight” turned out to be two unhappy customers who shared a workflow quirk the rest did not have. Four interviews would have sent real engineering time at a problem most of the base did not have.

The fix is not a magic number, it is a habit: interview in small waves, code each wave before the next, and let the data tell you when it has gone quiet. The discipline of synthesizing each batch before you run the next one is what converts a pile of conversations into a decision you can defend. Across two decades running IT operations, the requirements-gathering efforts that went wrong almost never failed from too few conversations. They failed because nobody sat down between conversations to ask what had actually changed in their understanding.

A working rule of thumb

If you want numbers to anchor on, use these as starting points, not finish lines. For a usability test on a built flow with one user type, five is a reasonable first round, then fix what you find and run five more. For problem discovery with a focused question and a fairly uniform audience, expect to hear the full range of themes by around nine to twelve interviews, and expect to need closer to twenty before you understand the conceptual ones well enough to bet on them. Add interviews for every distinct segment you are serving and every time your research question is still fuzzy. And remember that interviews are one input: the strongest discovery pairs what people say with what they actually do, because the gap between the two is often where the real product lives.

The number was never the point. The point is whether you have stopped learning. Five users can confirm a button is in the wrong place. Understanding why a customer would switch to your product, and whether enough of them would, takes more conversations than the meeting wants to hear, and a willingness to keep going until the data stops surprising you.

Sources:

Nielsen, J. “Why You Only Need to Test with 5 Users.” Nielsen Norman Group.
Nielsen, J. “Why 5 Participants Are Okay in a Qualitative Study, but Not in a Quantitative One.” Nielsen Norman Group.
Guest, G., Bunce, A., & Johnson, L. (2006), summarized in “How many qualitative interviews are enough?”
Hennink, M., Kaiser, B., & Marconi, V. “Code Saturation Versus Meaning Saturation: How Many Interviews Are Enough?”

What the five-user number actually measures

Discovery interviews are a different curve

The number you actually care about is the second one

Why a fixed quota is the wrong tool anyway

A working rule of thumb

Recent Posts