Previously in this series:
N=1: Introduction
N=1: Single-Subject Research
N=1: Hidden Variables and Superstition
N=1: Why the Gender Gap in Chronic Illness?
N=1: Symptom vs. Syndrome
N=1: Latency and Half-Life
The biggest limitation of an N=1 experiment is external validity. If you run enough trials on yourself, you can show that some intervention does or doesn’t have an effect on you to basically any degree of certainty that you want. But this will never provide much evidence that the same intervention will have the same effect, or any effect, on anyone else.
People are all human and have roughly the same human biology, it’s true. In the higher animals, decapitation is more or less guaranteed to be lethal; people generally like eating sugar and hate eating asphalt. But once you move beyond the fundamentals of biology, most other bets quickly are off.
An unspoken assumption of the self-experiment discussion (including our posts on the subject) is that there are exactly two kinds of research — self-experiments, and large trials. These occupy the sample size slices of N = 1 and N ≥ 30, respectively. The self-experiment and case study are assumed to be a single subject; and with few exceptions, most people don’t trust a survey or RCT with anything less than 30 participants.
But there are two problems with this perspective. The first is that this is a false dichotomy. There isn’t a point where N = 1 turns into N = small, and there’s no sample size where you go from having a collection of case studies to having a trial. Going from N = 29 to N = 30 does nothing in particular, and there is no other threshold that stands out as being at all distinct (except N = 0 to N = 1, of course). A bigger sample size always means more information and better external validity, with no discontinuity.
The second problem is that if N = 1 is at all good (and we think that it is), then N of small has to be better.
Anything that is good with an N of 1 will be better with an N of 2-10. With N of small, you get more data, more quickly. One person doing random daily trials over the course of a week will create 7 data points. Three people doing random daily trials over the course of a week will create 21 data points. Small-group analysis is a little more complicated, but the data can be handled by a standard linear mixed model (here’s an example that involves dragons).
With N of small, you get more diversity of participants and more diversity of responses, quickly drawing the fangs from the problem of external validity. You will be able to get some sense of whether the intervention works differently for different people. If you have five participants, it will be easy to see if they are all responding the exact same way, if they are responding somewhat differently, or if some of them are having huge responses while others feel nothing at all.
The only question is one of cost. Because while the biggest limitation of N = 1 is external validity, the biggest benefit is that it’s cheap in important ways. With N = 1, you don’t need anyone’s permission to start your study — you can just go do it. You don’t pay any coordination costs, costs which are easy to miss up front but can be quite a drag if you’re not careful. These factors help make self-experiments cheap.
But we think scaling up is usually worth it — or at least, once you have some promising N = 1, scaling to N of small usually makes sense. It’s the logical next step. And since there’s no real distinction between a single case study, a small collection of case studies, and a trial of 100 people, it’s also the logical next step on the path towards an RCT or other large trial.
So while this series has focused on true N = 1 self-experiments, the real wins for the future may be in N = 2-10 studies where people grab a couple of friends and run a self-experiment together. Remember kids, friendship is the most powerful force in the universe.

And it’s not at all unprecedented, since this is how we approached our community trials; we looked at a couple of case studies, and then used N of small to do the pilot testing.
For the potato diet, we started with case studies like Andrew Taylor and Penn Jilette; we recruited some friends to try nothing but potatoes for several days; and one of the SMTM authors tried the all-potato diet for a couple weeks.
For the potassium trial, two SMTM hive mind members tried the low-dose potassium protocol for a couple of weeks and lost weight without any negative side effects. Then we got a couple of friends to try it for just a couple of days to make sure that there weren’t any side effects for them either.
For the half-tato diet, we didn’t explicitly organize things this way, but we looked at three very similar case studies that, taken together, are essentially an N = 3 pilot of the half-tato diet protocol. No idea if the half-tato effect will generalize beyond Nicky Case and M, but the fact that it generalizes between them is pretty interesting. We also happened to know about a couple of other friends who had also tried versions of the half-tato diet with good results.
We think that in all of these cases, N of small was much more convincing than N = 1 would have been. With two people, it’s much less likely that the effect is a fluke. Even if it works for one person and not for the other, that’s still evidence that we shouldn’t expect the effect to be entirely consistent; we should expect more ambiguity. And for something where the risks are unclear, like with potassium, two people going through without any side-effects is much more reassuring than one.