N=1: Bite the Bullet

Previously in this series:
N=1: Introduction
N=1: Single-Subject Research
N=1: Hidden Variables and Superstition
N=1: Why the Gender Gap in Chronic Illness? 
N=1: Symptom vs. Syndrome
N=1: Latency and Half-Life
N=1: n of Small
N=1: Dr. Garcia’s Queasy Irradiated Rats

When it comes to chronic illnesses, most people try to find ways to avoid the pain. This is because pain bad, and no pain, good. 

But we worry that avoiding pain is good in the short term but bad in the long term; penny wise and pound foolish. 

If you’re worried that pizza makes you bloated, it’s good common sense to try to avoid pizza. But it’s bad science. Past a certain point, avoiding pizza tells you nothing. To learn something more, you have to bite the bullet.

Better to wait for a day when you feel great, no bloating at all, and then intentionally go and eat pizza, and see what happens. Or each afternoon you feel good, flip a coin, and eat pizza when it comes up heads. Do this a couple of times. If you do this, you should be able to see if pizza is really a reliable trigger for your bloating.

There are two reasons to do this. The first is — what is it about pizza that makes you bloated? If you can show that pizza is a trigger, you can start doing empirical splits. Buy the same pizza and flip a coin. If it’s heads, scrape off the cheese and tomato and just eat the bread. If it’s tails, toss the bread and just eat the cheese and tomato scraped off into a bowl. Which makes you bloated? If it’s the cheese and tomato, separate them and do the same thing.

The second is that a lot of the time, we suspect you’ll find that pizza (or your personal equivalent) is not a trigger at all. It’s all too easy to think you see a pattern where there’s none to be found, and we tend to see food triggers even when food doesn’t matter at all. You could have been enjoying pizza this whole time. That seems worth knowing.

More on Macros

ExFatLoss recently put out an essay called RIP Macros, where he expresses skepticism about the common three-macronutrient paradigm, saying:

I suspect more and more that the idea of “macros” is just as useless [as CICO], unless you subdivide each of the macronutrients so much further as to dilute the concept completely.

Carbs, fat and protein are definitely real, and they’re definitely a useful lens through which to view some problems. If you get too much protein, you really can kill yourself by rabbit starvation.

But this doesn’t mean that the macros are a useful lens for every problem related to nutrition. For example, we know they have basically no bearing at all on scurvy. So they may not be a good way to understand other issues, like obesity. 

ExFatLoss makes a number of good points and we encourage you to read his essay. Here is a bit of extra commentary:

Ontology

First of all, ExFatLoss is doing the right kind of work here. In the 1980s, when obesity started looking like a major problem, macros probably seemed like a promising angle. But when you’ve spent 40 years attacking a problem from the same angle with no success, maybe it’s time to find a new angle. This is the kind of ontological remodeling that you need to crack tough problems.

If you commit to a lens too quickly, it’s easier to get locked in on assumptions that might turn out to be wrong. Related to this, we like the caution ExFatLoss shows in his naming conventions:

I changed the name because I wanted it to be more descriptive of what it was, not the proposed mechanism (lack of protein) – after all, I am still not sure that’s the causal factor.

But I also chose the “ex” (for experiment) part because it conveys uncertainty, and that the diet is in flux. It’ll evolve, hypotheses will be disproven. … In a sense, it’s almost like a serial number for an experiment, and I’ve added a few new serial numbers since: ex150deli, ex150sardines, ex150choctruffle, ex225.

Maybe we’ll figure out what exactly makes ex150 tick, and then we can nail down a more descriptive name. Until then, I’m hesitant, because it would be speculation and I’d rather have a serial number than a name that’s just flat out wrong.

Macros are probably just too “big”. Dividing all food into three categories is pretty broad strokes, and it won’t be surprising if it turns out that these strokes are too broad to be helpful. We can equally say that dividing all matter up into four elements didn’t work very well, and chemistry progressed much better once people got a handle on the fact that there was more than one kind of earth, that there are various airs, etc. 

Phase of matter was the system of the world at one point, but today we don’t think so much about the solid/liquid/gas distinction — we no longer think of oxygen as a fundamentally different species of thing from copper. They’re not a type of air and a type of earth, they’re both elements, elements that happen to be in different phases at room temperature.

And without getting into it too much, we’ll note that reading about how macros were discovered did not inspire much confidence in them as categories.

History

The other reason we don’t think obesity has anything to do with macros is because of history. 

People ate all kinds of diets throughout history, including all sorts of “bad” diets. People tried every combo of macros, and never got obese.

Some cultures ate high-fat diets. Some ate low-fat diets. Some ate lots of carbs. Others ate almost no carbs. You name it, some culture probably tried it. 

On top of this, people were subjected to all kinds of voyages, expeditions, crop failures, sieges, economic shocks, and migrations. When you’re under siege, you eat whatever happens to be in the city, so people besieged in different places ended up eating different weird diets just to stay alive.

These various shocks gave them all kinds of dietary diseases. Scurvy is famously associated with the age of sail, but also struck on the crusades. Beriberi is often found in prisons. And the ancient Romans discovered protein poisoning while sieging Intercatia around 150 B.C.:

Their soldiers were sick from watching and want of sleep, and because of the unaccustomed food which the country afforded. They had no wine, no salt, no vinegar, no oil, but lived on wheat and barley, and quantities of venison and rabbits’ flesh boiled without salt, which caused dysentery, from which many died.

The point is that throughout time and space, people have chosen or been subjected to almost every strange diet imaginable. These did give them all kinds of weird illnesses — we know that eating the wrong combination of things can make you sick in various ways. But as far as we can tell, these weird diets never made them obese.

This makes it unlikely that obesity can be caused by an imbalance in macros. If there were some ratio of fat / carbs / protein that could make you obese, someone would have noticed in the last 3000 years, because someone at some point would have been eating that ratio. History has provided a pretty thorough search of diet-space (not totally exhaustive, but covering a lot of ground) and has discovered lots of ways that a bad diet can fuck you up. But none of those ways was obesity. 

ExFatLoss makes this same point: 

There were of course a near infinite amount of diets people could’ve consumed back [in ancestral times]. All we know is they didn’t add refined flour and seed oils, because they wouldn’t have had those. But there might’ve been carnivorous ancestral peoples, fish-eaters, maybe some near-vegetarians. Some might have lived heavily off dairy. Some ate a lot of muscle meat, others more fat. The paleolithic era lasted over 3 million years and the earth is a big place.

So if obesity is a dietary disease, you’d think that some culture somewhere would have stumbled onto it at some point. As far as we can tell, that’s not the case. Though if someone can find an example of a reliably obese culture from before 1900, we would be very interested to know. 

To us, this is strong evidence against any macronutrient cause of obesity. And in general, we don’t think obesity has to do with ANY nutritional element of food. Vitamin C isn’t a macro, but the random walk of diets through history discovered the related disease (scurvy), and eventually normal science discovered the cure and the underlying compound. If obesity were caused by some micronutrient or something, we think it also would have been stumbled upon in antiquity, and that since then we would have found the missing compound at fault.

The exception might be nutritional elements that were very rare until the late 20th century. If there’s some substance that it was hard to even get 1 mg of before 1940, but most people are eating 200 mg/day of today, it would make sense why no one had gotten fat off that substance until recently. 

This is one point in favor of the seed oil theorists, who usually blame linoleic acid for the obesity epidemic. This compound has always been in foods, but it used to be much harder to get a lot of it. So if too much linoleic acid makes you obese (we don’t think it does, but just by way of example), it would make sense that no one before 1940 would have ever stumbled on this, because almost no one before 1940 was ever exposed to linoleic acid in these quantities. Hence such images:

We said, “we don’t think obesity has to do with any nutritional element of food”. But it might plausibly have something to do with non-nutritional elements of food, like pesticides or other contaminants. Again, if it’s something no one was exposed to before the 20th century, or that no one was exposed to in such modern quantities, then it isn’t ruled out by the relative absence of obesity before the 20th century.

Symmetry

It’s easy to assume the cure and the cause will be symmetric. For example, people who believe that a low-fat diet will cure obesity usually believe that this is because high-fat diets caused obesity. We think that high-fat diets can’t have caused the obesity epidemic, because people in history sometimes ate high-fat diets and didn’t get obese. Similarly, people who believe that a low-carb diet will cure obesity usually believe that this is because high-carb diets caused obesity, etc.

But it could be that something else (FACTOR X) caused obesity, and a low-fat diet happens to cure obesity for reasons totally unrelated to the cause.

This kind of thing is common. Antibiotics cure infections because they kill the bacteria that are making you sick, not because the infection was caused by a penicillin deficiency.

Empirically, it looks like macro-changing diets (e.g. low-fat, low-carbs, etc.) don’t reliably cause weight loss. But it’s possible that some nutritive diet could treat obesity — the potassium trial essentially fits this description, since potassium is a necessary mineral. We just don’t think a nutritive diet could cure obesity because of a matched deficiency. 

Half-Tato Diet Analysis

So we did this half-tato diet community trial. People signed up for a minimum of six weeks — two weeks of baseline, so we could see how their weight changed when they were eating as normal, and then four weeks where they got around 50% of their calories from potatoes every day.

This was inspired by our original Potato Diet Community Trial, which worked pretty well. In that study, people lost an average of 10.6 lbs over four weeks eating almost nothing but potatoes.

We say “almost nothing but potatoes” because most people took multiple cheat days, and it didn’t seem to make much of a difference. Combined with a couple of case studies, who reported enormous success on a half-tato diet (in particular, M with his potatoes-by-default), this made us wonder if a half-tato diet could be made to work almost as well as a full-tato diet. 

Anyways, let’s look at some results. 

Today’s analysis is based on a snapshot of the data taken on June 1, 2023 (about 10 weeks after the study was launched). This means we have up to 10 weeks of data, specifically 2 weeks of baseline and 8 weeks of half-tato. A few people are still going with the half-tato diet, but we will look at their data later.

The dataset is mostly straightforward, but here’s one note: One or two important measurements were missing for a small number of people. For example, they might have entered a weight for Day 28 and Day 30, but not Day 29 (which is important because Day 29 is the end of the first four weeks). 

When an important measurement like this was found to be missing, we filled it in by making the missing measurement the average of the two values around it. For example, if the weight measurement for Day 29 was missing, we filled it in with the average of the weights on Day 28 and Day 30.

We did all these replacements before doing the analysis, and only a few measurements were interpolated like this.

As usual: raw data, the analysis script, and study materials are available on the OSF

Participants

A total of 123 people filled out the signup form. 

Of those, 8 people filled out the form incorrectly in such a way that we couldn’t sign them up (they didn’t enter an email, didn’t enter critical data such as height, etc.). We enrolled the remaining 115 people in the study.

Of the 115 people who were enrolled, 92 entered at least one day of weight data.

For people who entered any data, the most common outcome was to make it the full 2 weeks baseline + 4 weeks half-tato, though people dropped out at various points along the way, and a few people didn’t finish the baseline two weeks. 

Here you can see how many days people completed. In this figure, the vertical line at 0 divides the baseline span (Days -14 to -1) from the half-tato span of up to 8 weeks (Days 1 to 57). 

Let’s summarize that plot. As of the snapshot on June 1st:

  • 92 people entered at least one day of weight data
  • 75 people made it to Day 1, past the baseline period of two weeks
  • 38 people made it to Day 29, the end of the first 4 weeks of half-tato
  • 8 people made it to 8 weeks or further, and some are still going

For this analysis, we will mostly be focusing on weight change up to Day 29, since there’s not much data past that point. 

Weight Change over Baseline

First let’s look at the baseline. Similar to a crossover design, this baseline serves as a kind of control group.

There was very little average weight change in the baseline period, and it was not statistically distinguishable from zero. Here’s the histogram of weight change over baseline, with a black vertical line at 0 lbs (i.e. no weight change over baseline) and a red dashed vertical line at the mean weight change:

The mean weight change over this period was -0.22 lbs, with a 95% CI of -0.70 lbs to 0.27 lbs. This is not statistically distinct from zero. 

The mean suggests an average loss of 0.11 lbs per week on average, or 0.35 per week if we take the lower bound of the confidence interval. 

Of course, it’s also consistent with an average weight GAIN of 0.14 lbs per week if we take the upper bound of the confidence interval.

In previous studies, people have expressed concern about the Hawthorne effect — that when we ask people to measure their weight, they might start losing weight simply because they are aware that their weight is being observed. Looking at the baseline period, we find very little support for this idea, even with a sample size of 75 people. 

Observing your weight for two weeks just doesn’t change it much, and likely doesn’t change it at all. Going forward, we will continue to not worry about the so-called Hawthorne effect. 

(Also, it’s amusing to see that Wikipedia kind of drags this whole idea: “some scholars feel the descriptions are fictitious” and “J. G. Adair warned of gross factual inaccuracy in most secondary publications on the Hawthorne effect and that many studies failed to find it.”)

Here’s a plot of weight change over baseline, including only people who finished the two-week span. As you can see, these look like a bunch of random walks around zero.  

Weight Change at Four Weeks

Our main interest is weight change on the half-tato diet, specifically people’s weight change between the morning of Day 1 and the morning of Day 29. Here’s the histogram of that variable, with a black vertical line at 0 lbs (i.e. no weight change over 29 days) and a red dashed vertical line at the mean weight change:

People lost 1.7 lbs on average over these four weeks, and that loss is significantly different from zero, t(37) = 2.70, p = .010. Another way of putting this is that 27 out of 38 people (71%) lost at least some weight.

By now we’re sure you’ve noticed the extreme outlier, the person who reported losing 17 lbs over four weeks (participant 25348806). This outlier is impressive, and we’ll look at her results in more detail later, but excluding that person doesn’t change the overall results. Without the outlier, average weight loss is 1.3 lbs over four weeks, and that loss remains significantly different from zero, t(36) = 2.66, p = .012.  

We see that weight loss is significantly different from zero. People do seem to lose weight on the half-tato diet. 

But we should also emphasize that they don’t lose much — the effect size here is a disappointment. We had hoped that the half-tato diet might have around half the effect of the full potato diet, but that just didn’t happen. 

Overall, the effect is less than half the effect of the original potato diet. Average weight loss on the potato diet was 10.6 lbs, so half of that would be 5.3 lbs. Instead we see only around 15% of the effect of the full-tato diet. 

We should note that there are some mitigating factors here. In particular, about 30% of participants in the half-tato diet started out as “normal weight” (BMI < 25), compared to only about 15% in the original potato diet. (In the original study, people who were obese or overweight tended to lose more weight, so this means the average weight loss will look smaller when there are fewer obese or overweight participants.)

But weight loss on half-tato is still quite minor, even if you limit the analysis just to overweight (BMI > 25) participants, who lost 1.8 lbs on average, or obese (BMI > 30) participants, who lost 3.1 lbs on average. This is still much less weight loss than on the original potato diet.

Another way to put it is like so: On the original potato diet, 64 people made it 4 weeks. One of those people lost no weight. Everyone else lost more than the AVERAGE weight loss on the half-tato diet. It’s really no contest; full-tato is overwhelmingly more reliable and causes overwhelmingly more weight loss, at least among the people who can make it four weeks on mostly potatoes. 

Frankly, this just emphasizes how successful the original potato diet study was. In fact, on reflection the Potato Diet Community Trial was probably the most successful weight loss study of all time. Are there any other studies that caused weight loss in 98% of people who finished the study, and caused an average of 10.6 lbs of weight loss over just four weeks? Not that we know of. 

Trajectory

As we mentioned, there’s one extreme outlier who lost 17 lbs over four weeks. You may also have noticed a less-extreme outlier who lost 9 lbs, who happens to be someone who participated in the original Potato Diet Community Trial and saw a lot of weight loss there as well, losing 19 lbs. Both of them stand out quite clearly in a plot of people’s weight loss trajectories:

Having seen some reports like this one, we wondered if there might be a yo-yo effect on the half-tato diet, where in the beginning people lose weight no problem, but at some point the potato effect stops working and their weight heads back to baseline. That seems like a reasonable way to interpret this plot: 

But overall, this doesn’t seem to be the case. In general, half-tato weight loss over four weeks seems small but constant: 

Weight Change at Eight Weeks

We also have a tiny bit of data on people’s weight loss taking the half-tato diet out to eight weeks. Here’s the plot: 

The average weight loss at eight weeks is 3.6 lbs, though you can see that one person has lost more than 10 lbs. With only eight individuals, this is too few people to do a statistical analysis. But it does suggest that longer spans on the half-tato diet may be effective.

Note that the extreme outlier does not appear in this group — that person only sent us data up to Day 29.

Here’s the whole span from everyone who finished baseline (minus our main outlier), showing all data points from the start of baseline to the end of eight weeks: 

What Things Correlate with Weight Loss

There’s not much variation in people’s weight loss over these four weeks, but some people did lose more weight than others. This makes us wonder if there are any variables that might be correlated with weight loss.

Take the analyses below with a grain of salt. They’re very exploratory. The sample size is small. We’re not correcting for multiple comparisons. And of course, all these correlations are correlational.

As you well know, correlation does not imply causation — but as XKCD reminds us, “it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’.” Correlations can still be suggestive, and if any of the correlations we find are real, we should eventually be able to demonstrate the same relationships experimentally. So let’s take a look and see if anything stands out.

BMI

Our first surprise is that BMI doesn’t seem to have much to do with weight loss.

The correlation between weight loss and starting BMI is relatively small, and is not statistically significant, r(36) = -0.29, p = .078.

Protocol

We let people sign up for three different protocols for the half-tato diet, three different ways you could try to get about 50% of your calories from potatoes. People ended up about evenly split between the three approaches:

Here is a plot of weight loss by each of the protocols:

As you can see, there are no huge differences in weight loss between the three protocols, though Potatoes-By-Default includes the outlier who lost the most weight.

Percent Potato

We asked people to estimate what percent of their total calories they were getting from potatoes each day, and some people reported getting a much higher percent potato than others. Since some people were doing about 50% potato, and others were doing only about 10%, you might suspect that the diet caused more weight loss for people getting more potato. 

This is much more muddy than we expected. Getting closer to 50% of your calories from potatoes does seem to maybe cause more weight loss, but if so, it’s not super clear. The correlation is quite small and not significant, r(36) = -0.28, p = .084, and weaker if you exclude the major outlier, r(35) = -0.24, p = 0.147.

It’s hard to imagine that percent potato doesn’t matter at all, and we do see that the three people who lost the most weight were all getting close to 50% potato. This suggests that for best results, you should try to get around 50% potato on average. But there isn’t a clear correlation overall. 

Dairy

In the original Potato Diet Community Trial, we asked people to avoid dairy entirely. This time around, we decided to just ask people to track how many servings of dairy they got each day. This lets us look for any correlation between dairy consumption and weight loss on half-tato. 

There may be a bit of a trend where more dairy is related to less weight loss, but the person who lost the most weight ate plenty of dairy, and the overall correlation is not significant, r(36) = 0.15, p = .355.

That said, the relationship is slightly stronger if we exclude the outlier, though still not significant, r(35) = 0.29, p = .078.

Tomato 

We were also concerned that tomato products might interfere with potato-based weight loss. So just like dairy, we asked people to track how many servings of tomato products they had each day. Here’s the scatterplot:  

Surprisingly, this relationship is significant, even with such a small sample. The overall correlation is r(36) = 0.37, p = .021, and it remains significant if you remove the extreme outlier, r(35) = 0.36, p = .031. 

You can see that the two outliers, people who lost the most weight, almost entirely avoided tomato products on the diet. Also interesting is that the person who gained the most on the diet happens to be the person who ate the most servings of tomato products. 

This is correlational, not corrected for multiple comparisons, etc., but it does provide more support for our suspicion that tomatoes interfere with the potato weight loss effect. This would be great to experimentally confirm at some point, and it should be relatively easy to test — just assign some people on a potato diet to use ketchup, and others to eat their potatoes bareback, i.e. no ketchup. In the meantime if you are trying to lose weight using potatoes, we certainly encourage you to avoid ketchup.

Cooking Method

We’ve previously mentioned that boiling or soaking potatoes removes a lot of their potassium. So we’re curious to see if people who boiled their potatoes lost less weight than people who baked, roasted, fried, or otherwise kept their potatoes for the most part whole and un-leached. 

Most people didn’t leave detailed notes on how they prepared their taters, but the people who did leave notes often mentioned either boiling them or using frozen potato products, which are generally pre-boiled / blanched / parboiled. 

This might explain why the half-tato diet did not cause much weight loss on average — if we’re right, and the weight loss is caused by potassium (or anything else in the potatoes that is leached out on boiling/blanching/soaking; who knows, maybe iodine), then many people were consuming less effective potatoes.

There aren’t enough reports to bother hand-coding preparation method or doing an analysis, but here are some examples:

(42475044) Most of my potato meals were a 50/50 mix of roasted yellow potatoes (partially peel 1 inch cubes, lightly oil, 375 convection for 45 minutes), and store-bought frozen french fries (whatever seemed to have the least oil) cooked in the air fryer with no additional oil. 

(63062664) My protocol was mostly whole boiled potatoes pan-fried in ~15g of butter or a small glug of rapeseed or olive oil. Usually ~1kg for breakfast + lunch.

(78152385) I ate mainly russet or golden potatoes, baked or roasted, and I didn’t eat the skins of the russet because last time I did that it gave me the worst stomach cramps I’ve ever had. I also ate a lot of Alexia french fries with sea salt, and some sweet potatoes.

(80975703) I always ate potatoes I had boiled in batches and kept in the fridge. My favourites were red potatoes, half peeled, but I also had yellow or white potatoes, fully peeled. Always with a bit of olive oil and salt and spices, chopped up and reheated in a pan on the stove.

(28228309) I had visions of making home-made latkes or really fine hash browns. I just didn’t make time. While I know we are supposed to start with whole potatoes, I’m sure glad I found frozen potato patties at the store, or there’s no way I could’ve even approximated the quantity of potato I needed. I put my toaster to 6 (nearly the highest setting) and toast them twice, and they’re great, and I could do it for breakfast on work days.

(30834698) I do not like skin on the potatoes; I can eat it, but I do not like the taste or how it makes me feel; I prefer them without skin, so I mostly eat them like that; usually just boiled with a pinch of salt, sometimes in the oven, sometimes with a drop of olive oil; sometimes with some harissa; the easiest and tastiest for me was boiled with salt, then peel the skin and eat them

(72618178) In general I was making homemade oven-baked ‘fries’ (thinly sliced par-boiled potato). I would often give in and allow myself ketchup or spicy mayo. I also went through some phases of doing homemade gnocchi, mashed potato, and faux-dauphinoise (thinly sliced, stacked, oven-baked potatoes with veg stock and a bit of butter).

As you can see, many people boiled their potatoes or used frozen potato products that were likely boiled in some way before freezing. But to be fair, this does not describe everyone. Some people did report mostly baking or roasting:

(58681391) I usually baked an entire 5 lb. bag of gold potatoes at 350 for 1.5 hours, for roughly three servings. I didn’t use oil when baking but would sometimes refry the baked potatoes into hash browns with about 1 tsp of avocado oil.

(70030447) My main method for eating potatoes, as I work from home, was to chuck a few russets in the oven for an hour after coating them in salt and pepper, then once they’re done I would cut them into two halves and eat those entirely. I found olive oil a hassle, and putting salt and pepper on the insides after they’re done was also too much hassle for me to want to bother doing everyday. Maybe I’d do that if I cooked them some other way.

Despite eating baked or roasted potatoes, neither of these people lost weight. The first saw no change at all, and the second gained 4 lbs. This is enough to show that baking or roasting is not enough to ensure weight loss. 

But there may be other reasons these two didn’t lose any weight. 58681391 ate a lot of tomato and dairy, and got only about 38% of their calories from potatoes. 70030447 ate an unusually large amount of dairy (third most out of everyone) and got only about 20% calories from potatoes.

In any case, we still suspect that starting with whole, raw potatoes, and not boiling, soaking, or blanching them, might be important for causing potato weight loss. We didn’t make people roast or bake their potatoes in the original potato diet study, but maybe with +90% potato, it doesn’t matter.

It might have been an oversight not to ask people to roast or bake their potatoes for the half-tato protocol. If you’re trying it for yourself, probably don’t boil them or live off of frozen french fries.

Regression Analysis

To wrap up these correlational analyses, we fit some regression models to try to predict weight change from multiple factors at once. In all these models, we excluded the outlier who lost 17 lbs, participant ​​25348806, because we wanted to try to understand things that might have impacted weight change for the average participant, who did not lose so much weight. 

One especially strong model included total dairy consumption (p = .007), total tomato consumption (p = .003), and their interaction (dairy * tomato; p = .035). This interaction had a negative sign, suggesting that tomatoes and dairy are slightly less than the sum of their effects. All three terms were significant predictors of weight change, and the model explained 23.7% of the adjusted variance in people’s weight change. 

This was a much better fit than we expected, especially given the small sample size, and it provides more support for the idea that tomato and dairy consumption for some reason inhibit the potato weight loss effect. Note that this is TOTAL dairy and tomato consumption over four weeks, not average daily consumption, which provided a weaker fit.

This was not the best model we found, however. When you dummy-code the three potato protocols, and put them in a model with total tomato consumption and the two-way interactions, many terms are significant (for example, True Half-Tato condition * tomato sum is significant, p = .0004) and the model explains 37% of the variance in weight loss. We literally are not sure what to think of that, and are not sure how to interpret this result.

In any case, these are very simple models. It will be hard to squeeze more information out of just 37 observations, but if you have experience with more complex forms of statistical modeling, we encourage you to download the data and see if you can make more sense of it than we can. 

Potatosis

Some people liked getting half of their daily calories from potatoes:

(23555212) This was cool! I have a newfound appreciation for potatoes.

Other people did not:

(28228309) Oh happy day. No more forcing myself to eat bland potatoes. 

(81471891) Not super happy with my mindset about this diet. It’s currently “I *have* to eat 1 kg of potatoes per day!”, and feels a bit forced.

This is kind of striking compared to the absolutely rave reviews we got about the 100% potato diet, where most people said that they loved it. You’d think that eating 100% potatoes would be a bigger ask and a bigger pain than eating just 50% potatoes, but apparently not. 

This makes us wonder if most people in this study never went into “potato mode”. In the original potato diet study, we found that after a day or two of eating potatoes, most people’s appetites waned, they didn’t want anything aside from potatoes, and they began to steadily lose weight. This seemed like a separate “mode” the body can be in, that both caused weight loss and made it easy to eat nothing but potatoes without major discomfort.

If something about the half-tato diet keeps people from going potato mode — the percent potato wasn’t high enough, the potatoes were prepared wrong, ketchup is a potato inhibitor, etc. — that would explain why people didn’t lose much weight, and why many people found it difficult to stick with even a mere 50% potatoes. 

This is corroborated by a comment from one person who was also a participant in the original potato diet study, and says that they found half-tato very different:  

(42475044) Overall this didn’t work anywhere near as well for me as the full potato. My weight over the last 8 weeks has largely stayed the same, whereas on the full-tato I lost 9 pounds in 3 weeks. I could definitely feel that the potatoes were helping me not gain weight, but I think my non-potato calorie intake was just too high for the potatoes to compensate for. On the full-tato diet I was able to eat as much as I wanted and still lose weight, but that doesn’t seem feasible for me on half-tato.

That said, at least one person on the half-tato diet did report signs that sound a lot like potato mode:

(21268204) Sweating at night, which I never do otherwise. Appetite low… Get full really fast even when eating non-potatoes … 2nd day in a row that it didn’t occur to me to eat until 4pm … Have not been hungry at all the last few days. The calories I did get were because I forced myself to sit down, mostly, with some potatoes

This participant lost only one pound over the first four weeks, but kept going and lost 3.5 lbs over eight weeks. 

All this suggests that there might be a right and a wrong way to do half-tato. If you do it wrong, basically nothing happens, maybe you lose a little weight on average. But if you do it right, you go into potato mode, much like on the full-tato diet, and you start losing weight very quickly.

Let’s assume for the moment that there is such a secret magic switch (or set of switches) that can make half-tato cause rapid weight loss, and try to figure out what it is. If there is such a switch, then almost everyone on the full potato diet tripped it. All the case studies (like M) managed to trip it. The major weight-loss outlier in this study, and maybe some of the less major outliers, seem to have tripped it. Maybe they were doing something right that puts you in potato mode — so what would that be?

The extreme outlier (​​25348806) in this study give us a fairly detailed report of how she approached half-tato, saying:

I signed up for a spreadsheet for 52 weeks.  I’m doing the diet and have had great success … Am female with 100 or so lbs to lose (now 30 down).

I first lost about 15 lbs doing a very loose version of potato by default after first reading your blog pre half tato experiment and have since lost another 15 beginning April 22 with starting half tato in earnest.  I steam peeled yukon gold in batches in the Instant pot for 12-15 minutes at high/manual (depends on size, I try to get bigger but often its just medium available).  Right out of the instant pot I add white vinegar which helps preserve color and appearance and tastes great later (more subtle than adding vinegar at mealtime) before cooling and fridge.  I started eating a mix of cold and hot depending on if microwave is available (sometimes with mustard) but now I’ve settled into just hot (2 min microwave) with mainly salt.  I try to have this 2-3 meals out of the day (2 medium or 1 big 1 smallish per meal).  One of the 2 potato meals I may add one of:  poached egg yolks; calf liver lightly sauted in butter (plus lingonberries and/or honey); or cooked ground beef (with 21 gun salute seasoning from trader joes and sometimes full fat sour cream), and possibly pepper or cholula sauce (rare), occasional oysters (fresh or canned).  I don’t add ketchup (except once – when I went out and had beef fat fries at a steakhouse bar which did not seem to stall).  I really enjoy the potatoes and look forward to them.  I am not hungry but feel satisfied.  I also have dairy – at least one glass of milk a day (either raw whole milk or 2% or whole conventional) – and a small amount of juice or lemonade.  Some mornings I may have full fat yogurt with collagen and stearic acid (see fireinabottle.net) but not all mornings.  I have some extra potassium as well as other supplements.

We love the level of detail, but it’s hard to know which of these elements are required to enter potato mode, if any of them are. But there are some features that this outlier and all the half-tato case studies (M, Nicky, and Joey “No Floors” Freshwater) share:

  • Nicky had a bit of ketchup, but everyone else either never or almost never had ketchup with their potatoes. 
  • None of them avoided dairy
  • All of them mention eating meat and eggs
  • All of them used butter and/or oil
  • None of them ate boiled potatoes; their potatoes were generally steamed, air fried, microwaved, or baked 

To us, this further supports the idea that at least part of the secret switch is eating not-boiled whole potatoes and mostly avoiding ketchup and tomato products. Dairy doesn’t seem to matter much, or at least it didn’t stop these people, and neither do various fats, meat, or eggs. Of course, it’s difficult to tell if there might be some ADDITIONAL element that they are all getting right. Are they all getting lots of magnesium or something? Hard to say. 

Just in case it helps, here’s a closer look at the other people who lost relatively large amounts of weight on the half-tato diet: 

Participant 26130773 lost the second-most over four weeks on half-tato, a total of 9 lbs. Overall he ate a good potato percentage, reporting 40%-60% most days, though on some days he only got 20%. 

This participant left almost no notes and didn’t report his dairy or tomato intake, which makes it hard to figure out what he might have been doing right. But one thing that jumps out is that it’s clear he was eating lots of eggs. Here are his notes from the first three days of the diet:

5 eggs, potatoes for lunch (350 cal eggs. If I do 2 yokes 3 whites, 190 cal) Protein shake (120) for snack Turkey b patty, salad (600?) 

5 eggs w 2 yolks, few bites turkey (225) Protein shake (120) Soup w meatballs (500) 

5 eggs w 2 yolks (190) Protein shake (120) Normal dinner cheat (900) 2 drinks

Participant 56896462 lost the third-most over four weeks on half-tato, a total of 6 lbs. He had a very good potato percentage, 40% or 50% almost every day. He ate some dairy and some tomato, about 2 servings of dairy a day and 1 of tomato, on average. He also left very few notes, though we notice that he is in Italy.

Conclusions

The half-tato diet causes some weight loss in most people, but for most people, it is much less than half as effective as the full potato diet. If you really want to lose weight, probably go for the full potato diet instead, and try to get as close to 100% of your calories from potatoes as you can.

However, a small number of people do lose a lot of weight on the half-tato diet. This suggests that there might be some way to go into “potato mode” while on half-tato, if you do it right. If we could find out how to make this happen reliably, that would be pretty neat.

Our guess is that it involves some combination of:

  • Baking, steaming, microwaving, or roasting whole potatoes instead of boiling them or using pre-boiled frozen potato products
  • Avoiding tomato products, especially ketchup
  • Getting enough of something else, possibly something found in eggs, meat, or dairy.

We should note that this list is largely based on circumstantial and/or correlational evidence. We do worry that ketchup might be a potato-blocker, but the evidence is not yet all that strong. That makes all of these guesses good subjects for future experiments.

You could design a large trial to answer these questions — randomly assign 100 people to do half-tato with ketchup and 100 people to do half-tato without — but you might need a very large sample size to be able to detect a difference. And while we’d love to see more community trials, it may not be practical to do multiple trials of several hundred people each, one after the other, to try to chase down whether each of these things makes a difference. That seems like it would take forever and be a lot of work.

So instead, another option would be for individuals to test these guesses as a self-experiment, which could provide very strong evidence, and might be able to provide it quickly. 

For example, let’s say that Gary is a fellow who is happily losing 2 lbs a week on the full-tato or half-tato diet. Whatever makes potato mode happen, Gary has found it, even if he doesn’t know what he’s doing right.

Now Gary can test individual switches to see if they turn potato mode off. For example, he can randomly assign some weeks to be ketchup weeks, where he always has ketchup with his potatoes, and other weeks to be no-ketchup weeks, where he religiously avoids ketchup and all other tomato-based foods. 

If Gary’s weight loss always stalls on ketchup weeks, but continues humming along on no-ketchup weeks, that’s a pretty clear sign that avoiding ketchup is one of the switches to make the half-tato diet work. If the randomization makes no difference, that’s a pretty clear sign that ketchup doesn’t matter, at least not for him.

You can imagine a similar design for anything else. Gary could randomly assign some weeks to try only boiled potatoes, and other weeks to try only baked potatoes, and see if it makes any difference. 

We doubt things will be this simple — it’s quite possible that one brand of ketchup kills the potato effect, while another brand has no impact — but we won’t know until someone has tried. It might take several weeks to pick up a clear signal, but anyone who is able to get a potato diet working for them can test any of these switches out for themselves. 

All we ask is that if you try something like this, please publish your results online, regardless of how it turns out. We’re very curious to know what will happen!

Closing Notes

Some people have gone for more than eight weeks on half-tato, and we plan to analyze their results at some point in the future. It will be a small sample size, but we are excited to have some more case studies. So stay tuned. 

If you are interested in doing an N=1 experiment about these ideas and want our help designing a protocol, please feel free to contact us

If you would like to be notified of future stupid studies, or if you want to keep up with our work in general, you can subscribe to the blog by email (below), or follow us on twitter.

And if you feel like reading this post has added a couple of dollars’ worth of value to your life, or if you have lost weight as the result of our research and you think it improves the quality of your life by more than one dollar a month, consider donating $1 a month on Patreon

Thanks for going on this journey with us.

Sincerely, 
Your friendly neighborhood mad scientists,
SLIME MOLD TIME MOLD

N=1: Dr. Garcia’s Queasy Irradiated Rats

Previously in this series:
N=1: Introduction
N=1: Single-Subject Research
N=1: Hidden Variables and Superstition
N=1: Why the Gender Gap in Chronic Illness? 
N=1: Symptom vs. Syndrome
N=1: Latency and Half-Life
N=1: n of Small

I. 

In the old days, psychology was dominated by the school of behaviorism.

Behaviorism taught that mental states like thoughts and feelings are unworthy of study, and possibly don’t exist. 

Behaviorists also thought that animals are born without anything at all in their brains, that the mind at birth is a blank slate, and that everything an animal learns to do comes from pure stimulus-response learning built up over time. Turns out, this is wrong.

At some point in the 1950s, a guy named John Garcia was irradiating Sprague-Dawley rats for his job at the U.S. Naval Radiological Defense Lab, like you do, when he noticed something weird. The rats who had been exposed to low levels of gamma radiation were eating and drinking less than usual, and groups that had been exposed to radiation the most times ate and drank the least. 

Garcia thought that the rats might be learning to associate their food and water with the nausea from radiation exposure. After all, rats have no concept of ionizing radiation, so from their point of view, they were going about their day as normal when they suddenly started feeling nauseous for no clear reason. They might reasonably wonder if it was something they ate. In particular, he noticed that the rats wouldn’t drink out of the plastic bottles they were used to, but were happy to drink out of unfamiliar glass bottles. Garcia thought that maybe the plastic bottles gave the water a particular taste that the rats had learned to avoid. 

So in a series of experiments, Garcia tried exposing rats to different kinds of stimuli to see what they would learn. He discovered two surprises that called the whole behaviorist concept into question. 

First, he discovered that if a rat was exposed to radiation (making it nauseous) after encountering a new food, it would quickly learn to reject the food, even if the radiation came hours later. 

This contradicted the understanding at the time of how conditioning worked — behaviorists thought that you had to present the unconditioned stimulus (nausea) immediately after the conditioned stimulus (the new food), or the animal wouldn’t learn to associate the two. But Garcia found that learning could occur even if the rat got sick well after eating a new food. 

Rats would instantly associate nausea with whatever food they had most recently eaten, and had no problem doing so. If he made them sick after giving them Cheetos, they would learn to reject Cheetos forever. But the rats simply could not learn to associate their nausea with any other kind of stimulus. It didn’t matter if the stimulus was bright lights, or an annoying buzzer. No matter how many times Garcia flashed lights at them, the rats never learned to associate their nausea with the lights.

Everyone knows it’s mice that like cheetos, anyways

On the flipside, when he gave the rats electric shocks instead of exposing them to radiation, they would learn to be afraid of the lights and sounds. But no matter how many times he shocked them after eating, the rats would never learn to associate food or water with getting shocked.

This was another big pie in the face of behaviorism. Learning was supposed to be purely stimulus-response, and you were supposed to be able to teach an animal to do just about anything by pairing a behavior with the right reward or punishment. But Garcia’s rats seemed to be hard-wired to associate nausea (from radiation) with what they ate or drank, and similarly hard-wired to associate pain (from electric shocks) with what they saw or heard, and not to associate these things with anything else.

This was confusing to the behaviorists, but makes perfect sense if you think about evolution for even one second. In the real world, rats become nauseous when they eat spoiled food, so it’s important for a rat to associate nausea with things they recently ate. Any rat that doesn’t learn this will be dead, so eventually all rats are born prepared to make these food-nausea associations. Even though Garcia’s rats had been born in a laboratory and had never eaten a bit of ham left out in the sun for too long, they still came with an overwhelming bias to associate a feeling of nausea with whatever they most recently ate.

Similarly, pain is associated with sights and sounds, like the sight of an owl or the sound of a fox; or specific locations, like parts of the forest where predators are common. So rats are born ready to associate pain with things like weird noises or flashing lights. The idea that pain might be related to food, on the other hand, never crosses their minds. 

As you may have guessed, these predispositions aren’t limited to rats. In his review of John Bradshaw’s book on domes⁣tic cat psychology, Cat Sense, Gwern mentions that cats have a similar tendency to associate food with nausea: 

…[cats’] lack of trainability apparently has an exception, Bradshaw states: food can trigger learning of powerful associations even hours after consumption. This would make sense as an anti-bad-food defense, but unfortunately, this is yet another maladaptation in the modern context: “…this mechanism occasionally has unexpected consequences: a cat that succumbs to a virus may then go off its regular food even after it has recovered, because it has incorrectly associated the illness with the meal that happened to precede it.”

More generally this is called conditioned taste aversion, and it occurs in most mammals — though maybe not vampire bats, since they eat only one thing that never spoils, and being put off their food would be a guaranteed death sentence. 

(Some researchers did a version of Garcia’s study where they compared vampire bats with closely related species of bats that eat more than one thing, and while the other bats learned to avoid new flavors that were paired with nausea, vampire bats didn’t learn to associate new flavors with nausea when they were fed different kinds of flavored blood. Just imagine being that researcher on a first date; “Oh, what do I do at work? Yeah, I’m the guy who injects vampire bats with a 1% weight/volume lithium chloride solution to make them nauseous, it’s not much but it’s a living!”)

II.

Humans are also mammals, so we might have the same tendency. Maybe when we feel nauseous, or sick, or even just kind of weird, we assume it’s something we ate or drank. 

Wikipedia thinks this is the case, claiming, “even something as obvious as riding a roller coaster (causing nausea) after eating the sushi will influence the development of taste aversion to sushi,” but doesn’t offer any citations. We suppose you could run this study on your own with a few sushi meals and a season’s pass to INSERT LOCAL THEME PARK.

People often suspect that their chronic illnesses have food triggers, different kinds of food or drink that will bring on an attack or generally make them feel like crap. But if our brains are hard-wired to pick out food-based explanations for feeling ill, maybe we tend to latch onto the idea of some food trigger causing our illness, even when food has nothing to do with it. 

When our ancestors felt nauseous, it was usually because they had eaten the wrong kind of frog, so we come with a strong bias towards assuming that a random feeling of sickness is connected to something we ate. We don’t assume it has anything to do with the awesome glowing rocks we found in that sweet cave.

Such a cool rock, right? Oh hold on I have to lie down I feel terrible, must have been the goat’s milk I had for lunch

This worked well up until 3000 BC, but since then humans have discovered and invented lots of new things that can make you sick, most of which are not foods.

In general this should make us more skeptical of food triggers (and food-related triggers like packaging), especially if your chronic complaint is anything related to nausea, anything that feels like an illness, or anything digestive.

Food can still make you sick, and there are for sure some real food triggers out there. But the lesson here is that your instincts will tell you that your random sickness is caused by what you ate, even if it’s actually caused by something completely different. If you were one of Dr. Garcia’s rats, you would never have guessed that you were being hit with gamma radiation. You’d be all like, “it must be some chemical in those nasty plastic bottles.”

Links for May 2023

One month left in the mysteries contest! Get your submissions in by July 1st. Good luck! 🙂 

ExFatLoss: The Slightly Complicated Theory of Obesity 

In a sense, nutrition science has maneuvered itself into a corner here. Due to the religious insistence on randomized controlled trials and using a large number of people for studies, it’s pretty much impossible to find any solutions unless they apply to everybody.

If we insisted on the same methods [for] car mechanics, we would have to declare that there is no solution for cars stranded on the side of the road.

After all, we did a large study: we took a sample of 10,000 cars stranded on the side of the road, and we attempted all popular ways of fixing them. We put gas into them, we pumped up their tires, we topped off the oil, and we checked for any engine errors.

Yet not a single one of these repairs made more than 15% of the cars run again!

Clearly, cars cannot be repaired. That’s just science. Gasoline in, gasoline out!

Also from ExFatLoss: Looking for ex150 trial volunteers. ExFatLoss is trying to expand his self-experiment into an n of small study. We encourage you to consider signing up, especially if you tried some version of the potato diet and that didn’t work for you — the potato diet also didn’t work for ExFatLoss, so maybe there’s a common factor.

Old but good piece from The Atlantic: Roller Coasters Could Help People Pass Kidney Stones, though apparently only some rollercoasters work. Specifically, Big Thunder Mountain works pretty well but Space Mountain and Aerosmith’s Rock ‘n’ Roller Coaster don’t work at all. As usual, tumblr provides excellent commentary:

Land Ownership Makes No Sense — new WIRED piece on Georgism / land value tax, by friend of the blog Uri Bram. “Under Georgism, you would pay the same tax for your home as for an equivalent vacant lot in the same location.” Very nice framing!

A Simple Exercise to Strengthen the Lower Esophageal Sphincter and Eliminate Gastroesophageal Reflux: An Autobiographical Case Report (h/t Andrew Quist on twitter). This guy was struggling with gastroesophageal reflux and, after trying and becoming dissatisfied with some traditional treatments (“even after several refinements, the bed wedge remained intolerable”), came up with a form of resistance training for his lower esophageal sphincter, i.e. tried eating with his head below his stomach. It took several months but this intervention seems like it worked for him: “A 24-hour pH and manometry test was done, which yielded completely normal results. I then discontinued the use of the bed wedge and now have no symptoms that I can attribute to gastroesophageal reflux.” Not something we are going to focus on in the near future, but this seems like a good candidate for an n of small study, or even a full community trial. In particular, it’s nice because the intervention is very low-risk. Eating with your head below your stomach should be pretty harmless. If you have GERD or are otherwise plugged into the GERD community, you should consider running a study, we’d be happy to advise! 

A Cartography of Encounters:

Let me ask you again to draw your life but now with a slight shift in perspective. Do not draw a line. Draw a map of the encounters you have had with animals, insects, birds, weather systems, microbes that have metamorphically rearranged your matter. Draw a constellation of these encounters. What shape does your life take on when it is no longer articulated by the grammar of human progress?

Why Do So Many Book Covers Look the Same? Blame Getty Images

Strong opening salvo in this year’s ACX book review contest: Your Book Review: Cities And The Wealth Of Nations/The Question Of Separatism. Touches on a number of our interests, viz. Jane Jacobs, Quebecois separatism, balkanization, and cybernetics. Highly recommended! Here’s an excerpt:

Our breathing rate is regulated through a feedback mechanism. Too much carbon dioxide in the blood, or too little oxygen, and the brain stem commands the diaphragm to accelerate breathing. Once the levels are back to normal, the brain stem receives this feedback and slows breathing down again.

Now, Jacobs asks, imagine an impossible creature: ten people, all doing their own thing, but whose breathing is somehow regulated by a single brain stem. The feedback the brain stem receives is a consolidated average of everyone’s carbon dioxide and oxygen levels, and the breathing rate the stem decides on is applied to all ten people, regardless of whether they’re sleeping or playing tennis. 

This, to put it mildly, wouldn’t work.

Porphyrios (Greek: Πορφύριος) was a large whale that harassed and sank ships in the waters near Constantinople in the sixth century. Active over a period of over fifty years, Porphyrios caused great concern for Byzantine seafarers. Emperor Justinian I (r. 527–565) made it an important matter to capture it, though he could not come up with a way to do so. Porphyrios eventually met its end when it beached itself near the mouth of the Black Sea and was attacked and cut into pieces by a mob of locals.”

Relatedly: Whales are huge. So why don’t they get a ton of cancers? (h/t @JSheltzer)

Some research on how microplastics may impact digestion (h/t @ellegist; we think this is the original paper), though the design is not exactly the most realistic: “The team added nanoplastics to a slurry that contained proportions of protein, fat, carbohydrates, sugar and fibre comparable to the average US diet. The researchers then added heavy cream to boost the fat content. To simulate digestion, they passed this solution through three other liquids containing enzymes and molecules present in the mouth, stomach and small intestine.” Also of interest might be this similar paper, from one of the same authors, on the impact of titanium dioxide on lipid digestion. The in vitro methods are pretty whatever, but sharing these just in case.

The Problematic Myth of Florence Nightingale:

…like most lone-hero narratives, this one is not entirely true: For one thing, Nightingale herself trained with a group of German deaconess nurses, something she could hardly have done if she invented nursing. She did become famous for advocating for nursing as a trained profession, but as she did so, she shrank nursing into a restrictive, exclusionary Victorian corset, constructing a version of nursing that conformed to rigid social mores, one divided by class, race, and gender—a reimagining of nursing palatable to British colonialism.

Tempus Nectit Knitting Clock — “Wilhelmsen’s clock was designed as an art project that showed the passage of time by knitting a stitch every half hour, a row every day. At the end of a year the machine would drop a 365-row scarf from the bottom.”

More evidence of possible meteor deaths from the premodern era: 1490 Ch’ing-yang event

​​Merriam-Webster: “Hey ding-dongs, let’s have a chit-chat about Ablaut reduplication.” We’re happy to report that the dictionary continues to be one of the best poasters [sic] on twitter.

One of our first posts to break containment was a very long essay titled, Higher than the Shoulders of Giants; Or, a Scientist’s History of Drugs. If you read this piece, you’ll be familiar with Vin Mariani, a popular “tonic wine” of the late 19th century, and by “tonic wine” we mean a man named Angelo Mariani put cocaine in wine and then sold it to the feverishly twitching masses. Well, we’re happy to report that Babco Europe brought back Vin Mariani in 2017, and it appears to be still available, though we see that it is “fortified with de-cocainised Peruvian Coca leaf”. Disappointing but not surprising. 

Queen of Pigs

N=1: n of small

Previously in this series:
N=1: Introduction
N=1: Single-Subject Research
N=1: Hidden Variables and Superstition
N=1: Why the Gender Gap in Chronic Illness? 
N=1: Symptom vs. Syndrome
N=1: Latency and Half-Life

The biggest limitation of an N=1 experiment is external validity. If you run enough trials on yourself, you can show that some intervention does or doesn’t have an effect on you to basically any degree of certainty that you want. But this will never provide much evidence that the same intervention will have the same effect, or any effect, on anyone else. 

People are all human and have roughly the same human biology, it’s true. In the higher animals, decapitation is more or less guaranteed to be lethal; people generally like eating sugar and hate eating asphalt. But once you move beyond the fundamentals of biology, most other bets quickly are off. 

An unspoken assumption of the self-experiment discussion (including our posts on the subject) is that there are exactly two kinds of research — self-experiments, and large trials. These occupy the sample size slices of N = 1 and N ≥ 30, respectively. The self-experiment and case study are assumed to be a single subject; and with few exceptions, most people don’t trust a survey or RCT with anything less than 30 participants. 

But there are two problems with this perspective. The first is that this is a false dichotomy. There isn’t a point where N = 1 turns into N = small, and there’s no sample size where you go from having a collection of case studies to having a trial. Going from N = 29 to N = 30 does nothing in particular, and there is no other threshold that stands out as being at all distinct (except N = 0 to N = 1, of course). A bigger sample size always means more information and better external validity, with no discontinuity.

The second problem is that if N = 1 is at all good (and we think that it is), then N of small has to be better. 

Anything that is good with an N of 1 will be better with an N of 2-10. With N of small, you get more data, more quickly. One person doing random daily trials over the course of a week will create 7 data points. Three people doing random daily trials over the course of a week will create 21 data points. Small-group analysis is a little more complicated, but the data can be handled by a standard linear mixed model (here’s an example that involves dragons). 

With N of small, you get more diversity of participants and more diversity of responses, quickly drawing the fangs from the problem of external validity. You will be able to get some sense of whether the intervention works differently for different people. If you have five participants, it will be easy to see if they are all responding the exact same way, if they are responding somewhat differently, or if some of them are having huge responses while others feel nothing at all. 

The only question is one of cost. Because while the biggest limitation of N = 1 is external validity, the biggest benefit is that it’s cheap in important ways. With N = 1, you don’t need anyone’s permission to start your study — you can just go do it. You don’t pay any coordination costs, costs which are easy to miss up front but can be quite a drag if you’re not careful. These factors help make self-experiments cheap. 

But we think scaling up is usually worth it — or at least, once you have some promising N = 1, scaling to N of small usually makes sense. It’s the logical next step. And since there’s no real distinction between a single case study, a small collection of case studies, and a trial of 100 people, it’s also the logical next step on the path towards an RCT or other large trial. 

So while this series has focused on true N = 1 self-experiments, the real wins for the future may be in N = 2-10 studies where people grab a couple of friends and run a self-experiment together. Remember kids, friendship is the most powerful force in the universe

And it’s not at all unprecedented, since this is how we approached our community trials; we looked at a couple of case studies, and then used N of small to do the pilot testing. 

For the potato diet, we started with case studies like Andrew Taylor and Penn Jilette; we recruited some friends to try nothing but potatoes for several days; and one of the SMTM authors tried the all-potato diet for a couple weeks. 

For the potassium trial, two SMTM hive mind members tried the low-dose potassium protocol for a couple of weeks and lost weight without any negative side effects. Then we got a couple of friends to try it for just a couple of days to make sure that there weren’t any side effects for them either. 

For the half-tato diet, we didn’t explicitly organize things this way, but we looked at three very similar case studies that, taken together, are essentially an N = 3 pilot of the half-tato diet protocol. No idea if the half-tato effect will generalize beyond Nicky Case and M, but the fact that it generalizes between them is pretty interesting. We also happened to know about a couple of other friends who had also tried versions of the half-tato diet with good results. 

We think that in all of these cases, N of small was much more convincing than N = 1 would have been. With two people, it’s much less likely that the effect is a fluke. Even if it works for one person and not for the other, that’s still evidence that we shouldn’t expect the effect to be entirely consistent; we should expect more ambiguity. And for something where the risks are unclear, like with potassium, two people going through without any side-effects is much more reassuring than one. 

Links for April 2023

This is the two-months-left reminder for entries to our MYSTERY CONTEST. There are already two entries, and you still have two months to write and submit yours! 

Speaking of mysteries: Jeff Wood’s story of diagnosing his ME/CFS as a mechanical problem with the craniocervical junction, the place where your skull connects to the first two vertebrae (h/t JG in the comments on N=1: Symptom vs. Syndrome). He found a treatment that worked for him, and as far as we’ve heard, he is still in remission. Most interesting for the simple, obvious diagnostic test; if you have ME/CFS symptoms, try wearing a neck brace or just pull up on your head and see if your symptoms get better. See also the CCI + Tethered cord series from Jennifer Brea. 

Still speaking of mysteries: “Paranasal sinuses are a group of four paired air-filled spaces that surround the nasal cavity… Their role is disputed and no function has been confirmed.” Also, why do they (reportedly) generate nitric oxide? The Wikipedia talk page on this one is also amusing. “more details of structure please. they are just empty pockets of air? how does the air get there? are they lined with tissue or Moo Hog are they just bone? hoopenings does each have? how do they becom e ‘pressurized’? etc etc-” writes User:Omegatron in 2005. Maybe the sinuses are well-understood by experts, but in that case, the Wikipedia page itself is a mystery. 

No longer speaking of mysteries: We made a tumblr, in case the bird site dies or becomes unusable. 

Adam Mastroianni argues that science is a strong-link problem. See also this excellent elaboration on the point, A Model of Quality Control in Strong Link Science, from Maxwell Tabarrok.

Salt, Sugar, Water, Zinc: How Scientists Learned to Treat the 20th Century’s Biggest Killer of Children. Like the story of scurvy, a clear example that eventual cures may look no more than vaguely promising at first, before we figure out the details of how to make them work reliably. Also, a lesson on following up on leads, even if they look weird or dumb or inconsistent at first. It doesn’t have to take 140 years!

The Ineluctable Smell of Beer — Part 1 in a fascinating series about the rise of healthcare costs (h/t Krinn). Really about the costs and reasons for “coordinative communication”. Kind of argues that bureaucracy is a symptom of bad things rather than the cause of them? You normally look at a dysfunctional, bureaucratic system and assume, “the bureaucracy caused the dysfunction”. But: “maybe it should take us aback that our health care system incurs such extreme coordinative communications costs, that paying all those people to handle it is actually more cost effective than not.”

The Atlantic: Could Ice Cream Possibly Be Good for You? (or here to avoid the paywall). “The dissertation explained that he’d hardly been the first to observe the shimmer of a health halo around ice cream. Several prior studies, he suggested, had come across a similar effect. Eager to learn more, I reached out to Ardisson Korat for an interview—I emailed him four times—but never heard back. … Inevitably, my curiosity took on a different shade: Why wouldn’t a young scientist want to talk with me about his research? Just how much deeper could this bizarre ice-cream thing go?” lol

Tyler Ransom did a N=1, T=1166 self-experiment where he lost 15 lbs in four months. 

A School of Strength and Character:

The institution builders of the Civil War embodied a type of excellence that foreign observers of their era described as characteristically American. … But less than a century after the Civil War, American life did become dominated by centralized and professionally managed bureaucracies. The two world wars only served to entrench this way of life in business and politics. The population, in response, became increasingly conditioned to lobbying for centralized decisions instead of self-organizing. Those who introduced managerial bureaucracy to American life understood the “great strength” bureaucratic tools would grant them. But these tools destroyed the conditions that made them so adept at institution building in the first place. The first instinct of the nineteenth-century American was to ask, “How can we make this happen?” Those raised inside the bureaucratic maze have been trained to ask a different question: “how do I get management to take my side?” 

Someone tracked down the original take of the Wilhelm Scream.

Weinersmith on political hobbyism

AI and the American Smile: How AI misrepresents culture through a facial expression

On the unexpected joys of Denglisch, Berlinglish & global Englisch

The great Milk Diet experiment results are in (h/t anon). Compare for sure to ExFatLoss’s +80% cream diet. Do be careful of excessive calcium intake, drinking this much milk may not be good long-term (though ExFatLoss seems to be doing ok?).

N=1: Latency and Half-Life

Previously in this series:
N=1: Introduction
N=1: Single-Subject Research
N=1: Hidden Variables and Superstition
N=1: Why the Gender Gap in Chronic Illness? 
N=1: Symptom vs. Syndrome


I. Latency

a. Melons

Peter has a bad reaction to melons. Every time he eats melon, he gets sick right away, and he often throws up. 

We can say that Peter’s reaction to melon has low latency. When it happens, it happens right away. No waiting about.

Mark also has a bad reaction to melons. But because of a complex series of biochemical interactions, when Mark eats melon, he doesn’t get sick right away. He gets sick about three days (72 hours) later, when he suddenly starts to feel very ill, and then often throws up.

We can say that Mark’s reaction to melon has high latency. It happens, but it always takes a long time to kick in.

Peter and Mark have basically the same reaction to melon. Both have the same symptoms — nausea, sickness, and vomiting. Both reactions happen for sure every time — they are both equally reliable. The only thing that’s different is the latency.

 

b. Different and the Same

Though their reactions are nearly identical, Peter and Mark end up with very different experiences of their sensitivity. 

Peter quickly learns that melon is a trigger. After all, he gets sick right away. He just makes sure to avoid melon and goes about his life with no additional air of mystery. 

Mark, on the other hand, is plagued with random, crippling nausea. He sometimes gets sick, and it always seems to be for no reason. This is because it’s hard to remember what you were eating exactly 72 hours ago (for example, take a moment to try to remember what YOU were eating 72 hours ago). So for Mark, the connection is very obscure. He may never figure it out.

Both of these relationships would become equally obvious in a self-experiment. As long as you were tracking melon consumption and looking for relationships over a long enough time frame, you would see that Peter gets sick right after every dose of melon, and Mark gets sick exactly 72 hours after every dose of melon. 

Perfect 100% reliability would make this pretty obvious once you noticed it. You don’t need a huge sample size to pick up on a relationship that is 100% reliable, which is why Peter quits melons after getting sick just a few times. 

The big difference is whether the relationship jumps out at you or not. Low-latency relationships are obvious; the close proximity of cause and effect highlights the correct hypothesis and draws immediate attention to the relationship, where it can quickly be confirmed. Peter can just eat more melon and immediately get corroborating evidence if he wants to confirm his theory. The relationship is intuitive; you know it when you see it. 

c. Cause and Effect

High-latency relationships are much harder to spot, even if they are equally reliable. The separation of cause and effect means that the connection may never come to mind. 

To even be able to pick up on this in a self-experiment, you would have to know in advance that you should be tracking how much melon you are eating. And this is the hard part. The hard part is not demonstrating the relationship. At 100% reliability, that’s easy. The hard part is picking up on what to track. 

This is somewhat in contrast to our normal concerns in research. Normally we worry about sample size and the quality of our measures. But Mark doesn’t need a big sample size. He doesn’t need any measures other than “got sick” and “ate melon”. All he needs is to consider melon as a possible cause of his nausea, and to consider looking for relationships with a latency of at least 72 hours. Easier said than done. 

d. Reliability in Real-World Relationships

Of course, most real-world relationships are not 100% reliable. Few things work every time. But it’s concerning how a little latency can hide an otherwise blatant relationship, and it makes us wonder how many connections we all miss because of relatively small delays in onset. 

Zero latency (eat melon, immediately puke) is easy to figure out. These relationships become obvious after just a few trials. 

In comparison, 72-hour latency is very hard to figure out. Most people are not looking for relationships with such a long delay, and even if you were, you would be hard pressed to figure out the cause. 

You can’t just keep a food journal and look 72 hours back — you don’t know how long the latency is, so you don’t know how far back to look! And if the latency varies at all (e.g. always between 60-80 hours later), it gets even harder.

This makes us wonder how much latency we can handle before connections stop being obvious. It may not take much. Coffee -> heartburn with an hour delay, that seems pretty doable. We think you would figure that one out pretty quickly. But with a four hour delay? Eight hours? Twelve? This would be much more difficult. It would start to look more like, “heartburn around dinnertime / going to bed, especially on weekdays”. That sounds hard to puzzle out. 

Latency also makes it harder to get a big sample size. With a latency of less than 5 minutes, Peter can easily do eight trials (eat some melon and face the consequences) in a single day. Mark can’t do that. He has to wait 72 hours to get the results from his first trial, except it’s worse than that, because he doesn’t know how long he has to wait for the results to come in. 

If he wants to make sure not to cross the streams, he needs to devote three whole days (though again, he doesn’t actually know in advance how much time he has to dedicate) to each trial, so he needs 3 * 8 = 24 days to do the same number of “eat melon and find out” trials that Peter can easily do in an afternoon, if he’s willing to get sick that much in a single day.

II. Half-Life

a. Creamer

Jo has a bad reaction to one of the additives in her office’s tiny cups of dairy creamer (henceforth: “creamer”). Every time she uses one of the tiny cups, she gets very tired about 30 minutes later. Fortunately, Jo’s kidneys happen to handle the additive really well, and two hours after she takes the creamer, she has cleared all of the additive out of her system, and stops feeling unusually tired. 

We can say that the additive has a short half-life in Jo’s system, and that the symptoms (fatigue) have a short half-life as well. They don’t stick around for long, things quickly go back to baseline. 

Lily works in the same office and has the exact same reaction to the same additive in the office’s tiny cups of dairy creamer. Every time she uses one of the tiny cups, she gets very tired about 30 minutes later. But through a random accident of biology, Lily’s body doesn’t clear the additive from her system nearly as quickly as Jo’s does. The additive sticks around for a long time, and Lily keeps feeling tired all week. If she takes some creamer on a Monday, she’s just getting over it on Sunday afternoon. 

We can say that the additive has a long half-life in Lily’s system, and that the symptoms (fatigue) have a long half-life as well. They stick around for a long-ass time, and it takes forever for her to feel normal again.

b. Puzzling it Out

Much like a long latency, a long half-life makes this problem much harder to puzzle out, even when the two cases are otherwise identical.

Jo has it easy. If she comes to suspect the creamer, she has a lot of options. She can try taking creamer some mornings and not other mornings. She can try taking the creamer at different times of day and seeing if the fatigue also kicks in at different times. She can even take the creamer multiple times in the same day. Since the symptoms clear out after just two hours, she’s quickly back to baseline and is ready for another trial. If she wants to compare different brands of creamer to see if there’s a difference, she can get a pretty good sample size in a weekend. It’s easy for her to collect lots of data.

Lily has it really hard. If she comes to suspect the creamer, she is in a real bind, and most of the traps are invisible. If she tries taking the creamer some mornings and not other mornings, her results will be a mess, because as soon as she takes it one morning, she is fatigued all week. It will look like the creamer has no effect at all, since on days when she doesn’t take the creamer, she is still fatigued from any creamer she took in any of the previous seven days. A day-by-day self-experiment would show no effect, even though this is totally the wrong conclusion.

To detect any effect, Lily needs to test things in blocks of weeks, instead of blocks of days or hours. Each Monday, either take the creamer or not, and see how tired she is that week. But you can see how hard it would be for her to figure out this design — how is she supposed to know in advance that she needs to study this problem in blocks of a full week? She has a lot less flexibility; you might say that her research situation is much less forgiving. 

Half-and-Half-Life

Even if Lily does pin down the right research design, it still takes her much longer to get the same amount of data. Randomly assigning creamer or no creamer each morning, Jo can get 28 data points in four weeks, which is enough data to detect a strong relationship if there is one. Meanwhile, in four weeks Lily would get only four datapoints, not enough to be at all convincing. 

If the relationship is weaker (e.g. only a 50% chance of becoming fatigued), things are even worse. Jo can get a sample size of 100 or 200 days if she has to; it would be a pain, but she could make it happen. But for Lily to get a sample size of 100 weeks would take two years.

c. Thought it Worked for a While 🙂 

Lots of people try something, feel like it works great, and then later when they do a more rigorous self-experiment or just keep trying it, they feel that the effect wears off. Must have just been excitement over trying a new thing. 

For example, back in early 2020 Scott Alexander put out a report describing his experience with Sleep Support, a new (at the time) product by Nootropics Depot. His sleep quality isn’t great, so he decided to give this new supplement a shot, and reported miraculous results: 

The first night I took it, I woke up naturally at 9 the next morning, with no desire to go back to sleep. This has never happened before. It shocked me. And the next morning, the same thing happened. I started recommending the supplement to all my friends, some of whom also reported good results.

“I decided the next step was to do a randomized controlled trial,” he says. To make a long story short, the RCT found no difference at all in any measure of sleep quality. “My conclusion is that the effect I thought that I observed – a consistent change of two hours in my otherwise stable wake-up time – wasn’t real. This shocked me. What’s going on?”

Scott chalks this up to the placebo effect, which is certainly possible. But another possibility is that Sleep Support did work great at first but was no longer detectable (for whatever reason) by the time he set up the RCT. Obviously if this is true, it would be hard to study; but it does perfectly match Scott’s experience, which is otherwise (as he says) shocking and somewhat confusing.

If you have any experience with chronic illness or biohacking or anything similar, then you know that “thought it worked for a while” is a very common story. When this happens, the assumption is usually that you were fooling yourself the first time around. But consider:

Vitamin C cures scurvy, so if you have scurvy, the first few doses of vitamin C are great! But after that, vitamin C has basically no effect, because you no longer have scurvy. You have been cured. Looking at this data (huge increases in wellbeing on the first few days, but after that, nothing), the research team concludes that the original reports were somehow mistaken. 

No! It’s just that the vitamin C helped and then it had done all it could! It had a huge effect! That effect was just all up front! 

This exact scenario should pop up all over the place. If you are iron deficient, the first few doses of iron will have some effect. After that, they will have no effect. If you are B12 deficient, the first few doses of B12 will have some effect. After that, they will have no effect. Et cetera.

This is because the body is able to keep reserves of all of these substances. As long as you’ve been getting enough vitamin C, you can go for 4 weeks without any vitamin C at all before you start getting scurvy (in reality it usually takes more like 3 months, because most people don’t go entirely cold turkey on vitamin C). Same goes for iron and B12 — your body is able to keep reserves of these substances, so as long as you get enough, you should be set for a while.

To put this back in the terms of this essay, we would say that these positive effects have a long half-life. Positive effects with a long-half life face exactly the same issues as negative effects with a long-half life — you have to make sure you take the half-life into account when designing a study, and use long enough study periods, otherwise your data will be confused and misleading.

This same point applies to a lot of treatments, actually. Assuming you have an infection, antibiotics will show a big effect up front and then nothing after that. But we don’t take this to mean that antibiotics have no effect, oops we thought it worked for a while, guess we were wrong.

This isn’t a problem for things with no reservoir. For example, as far as we can gather, zinc isn’t really stored in the body long-term. So most effects of zinc will (probably) have a short half-life. If you need more zinc, you can just take it on a given day and see the effects.  

Supplementing anything with a large reservoir (or other positive effect with a long half-life) may not be suitable for a self-experiment, because it will show a strong effect in the first few days and no effect after that. Aggregated over 30 days or whatever, this will look like no effect or a weak effect. Clearly this is the wrong interpretation.

And the longer you run the self-experiment for, the smaller the effect will appear! If you do a 10-day self-experiment with antibiotics, and they have an effect on the first two days, then you will find that this looks like 2/10 days show an effect, which will probably average out to a small effect. But if you kept going for 100 days, you would see that 2/100 days show an effect, which will average out to basically no effect at all.

This is the opposite of our normal assumption about sample sizes, that a larger sample size will always get us a more meaningful, accurate estimate. This assumption simply isn’t true if we’re dealing with a treatment that has a long half-life. 

So consider the half-life of positive effects too.

III.

Broadly speaking, triggers have some delay in the onset of their symptoms, and those symptoms stick around for some span of time. 

Having a high latency or a long half-life makes a relationship much harder to notice, and harder to study. Having both, it gets even worse.

Perhaps Bob is allergic to dairy, or whatever. It gives him hives, but with a latency of two days, and they persist for four days. Bob will be walking around with random hives, and not much hope of finding out why. 

He might come to suspect the true cause if he happens to cut out dairy for a while and the hives go away for good. But if someone challenged him on this — or if Bob, being a good scientist, decided he wanted to run a self-experiment to demonstrate the hive-causing effect — he would be hard pressed to get convincing formal evidence. 

Bob wouldn’t know in advance to look for a latency of two days and persistence of four days. If he did something reasonable, like randomly assign each day as dairy or non-dairy, the results would look like zero effect. On most days when he took no dairy, he would have hives anyways, because of the long half-life. On most days when he did take dairy, he would also have hives, because they stick around so long. The few “no hive” days would be in the random periods where he hadn’t had any dairy several days ago; but those days might well be days when he was assigned to drink dairy. So it would look like a wash, even though it’s actually a very reliable relationship. 

Bob would have to do something that seems totally unreasonable, like structure the trial in 6-day segments to account for these delays. If he did this right, the 2-day wait and 4-day stay would become entirely obvious. But how is he supposed to know in advance that he has to use this totally weird study design? 

N=1: Symptom vs. Syndrome


Previously in this series:

N=1: Introduction
N=1: Single-Subject Research
N=1: Hidden Variables and Superstition
N=1: Why the Gender Gap in Chronic Illness? 

I. 

People like to argue about whether obesity is a disease. Does it require treatment, or is it more of a social problem? But obesity isn’t a disease. It’s clearly a symptom. 

Think about it like this. Fatigue is a symptom, and it’s a symptom of many things. Fatigue can be a symptom of everyday decisions — you can be fatigued because you stayed up until 3 AM last night playing Octodad: Dadliest Catch. It can be a symptom of substances, like alcohol or Benadryl. It can be a symptom of conditions, like anemia or concussion. And fatigue can be a symptom of diseases, like mononucleosis, Parkinson’s, or lupus. 

Similarly, a person can be obese for a number of different reasons. Obesity is a symptom of many different conditions. You can be obese because of a brain injury. You can be obese because of a thyroid issue. You can be obese because you’re taking a drug like haloperidol or olanzapine. And while there’s still a lot of dispute over the source of the global obesity epidemic, you can be obese because of whatever cause(s) are causing that. 

II.

Things get confusing when you try to treat a symptom like a disease. 

Think about fatigue. If your friend is tired from playing video games until the wee hours of the morning, the correct treatment is for them to play video games while pretending to fill out spreadsheets at work, like a normal person. If they’re fatigued from drinking merlot or taking Benadryl, the only real option is to have them wait until the drug wears off (or take an upper, but that’s not really recommended). If they’re anemic, then they need to get more iron. Et cetera.

Similarly, we don’t know how to treat the general obesity we see in the obesity epidemic. But we do have treatments for obesity caused by thyroid disorders or brain tumors. And we shouldn’t be shocked if treatments that work for obesity caused by thyroid disorders don’t work for the obesity caused by brain tumors, or don’t work for the widespread obesity we see today.

Because a symptom can have many different causes, just looking at the symptom won’t always tell you the cause. And if you don’t know the cause, then you may not know the right treatment, because you don’t know the etiology; you don’t know how the cause connects to the symptom, at what points you can intervene, and what kinds of interventions might be helpful.

This is pretty bad — even when there’s a finite list of possible causes, it’s hard to look at a symptom and figure out which of its causes are responsible. 

III. 

Many chronic illness symptoms are nonspecific. Per Wikipedia

Nonspecific symptoms are very general and thus can be associated with a wide range of conditions. In other words, they are not specific to (not particular to) any one condition. Most signs and symptoms are at least somewhat nonspecific, as only pathognomonic ones are highly specific. But certain nonspecific signs and symptoms are especially nonspecific and especially common. They are also known as constitutional symptoms when they affect the sense of well-being. They include unexplained weight loss, headache, pain, fatigue, loss of appetite, night sweats, and malaise.

This means that people who are diagnosed with the same chronic illness could have similar experiences, similar symptoms, with entirely different causes. If you have headache/pain/fatigue, you might reasonably assume that someone else with headache/pain/fatigue has the same illness, and that it was caused by the same thing. You might assume that the same treatments will work for both of you, that your illness would have the same cure. 

But headache/pain/fatigue are all nonspecific — they can all be caused by a zillion [sic] different things. So someone who shares your exact symptoms may have the exact same experience but for totally different reasons. If this is the case, the treatments that work for one of you may not help at all for the other.

(Even worse, palliative treatments will tend to work for both of you, since they treat the symptoms directly, and this will make the two conditions seem even more similar. But curative treatments that work for one of you won’t work for the other, since your conditions have different root causes.)

Let’s consider migraines. Migraines can definitely be caused by hormones. Some people have migraines only during certain parts of their period (about 7-14% of women, according to Wikipedia), or only when pregnant. Migraines can also be caused, or at least partially caused, by triggers like stress or certain foods.

But there are also people who get random mystery migraines on a regular basis, with no apparent trigger. Presumably these are caused by something, but it’s not something obvious like stress or hormonal cycles or being pregnant. So clearly migraines are a symptom, not a disease — they can be caused by several different things.

All this to say that finding the “cause” of migraines may be the wrong framing. There may be no more single cause of migraines than there is a single cause of car accidents. Some accidents happen because the driver wasn’t paying attention (and many people think of this as prototypical). But some accidents happen because the road is icy. Some accidents happen because the driver had a seizure and lost control of the car. Some accidents happen because the vengeful spouse of the man you killed in El Paso 15 years ago has finally tracked you down and cut your brake lines. 

Not that we would know anything about that! We’ve never been to El Paso, officer, we swear.

There is no single cause of car accidents. They are more like a symptom. All car accidents look much the same — broken glass, tire marks, people yelling. Most car accidents have similar proximal causes — unless it was an intentional ramming, it happened because someone lost control of their vehicle. But despite these apparent similarities, car accidents can have wildly different original causes. They happened for different reasons.

Consider chronic fatigue syndrome (CFS). Most people assume that CFS is a disease, and that everyone with CFS has it for the same reason, that there is a single cause. But maybe CFS is more like a symptom (obviously “syndrome” is literally in the name). If so, the search for the “cause” of CFS is a mug’s game, since it is caused by many different things. If you go around assuming there is one cause of CFS, one etiology, you are going to end up very confused. 

Or consider irritable bowel syndrome (IBS). Most people seem to be aware that IBS is not really a single diagnosis, and probably is a term used to describe all sorts of different, unrelated things. E.g. “Some people just have trouble with their stomachs. When they have trouble and we don’t know what is causing it, we just call it IBS. So you have IBS.” Even so, the label kind of implies that there is a similarity of some sort, and suggests that maybe there will be some similarity of treatment and of cure. But this may be misleading.

If nothing else, the shared label means that all these people are likely to end up in the same groups or the same communities “for people with IBS”. If someone makes a post like “this treatment cured my IBS”, you can be sure other people will respond with, “well it didn’t cure *my* IBS”. This is guaranteed to be the source of a lot of confusion.

We think that most unsolved chronic illnesses are probably like this — most of them are probably different diseases with different causes that happen to look very similar.

Compare it to the anthropic principle if you like — diseases that present in a consistent way and have a single cause are easy to figure out, so they tend to be cured and don’t tend to be on the list of unsolved chronic illnesses. But diseases where a number of very different causes present very similarly will be quite hard to figure out, and are likely to remain mysterious for a long time. So things that are unsolved and have been unsolved for a while are more likely to have multiple causes. 

(Though even simple illnesses with precise single causes, like scurvy, can be devilishly difficult to figure out, so take this argument with a grain of salt.) 

IV.

Single-subject (aka N=1) research can be really powerful. But when it comes to cases like this, you have to be very careful. Even if you do a very rigorous single-subject experiment, and provide strong evidence that some treatment works for you, you’ve only really provided evidence that it works FOR YOU. It may not work for anyone else. 

If the treatment that works for you doesn’t work for most other people with your diagnosis, that’s actually somewhat informative. We can see why some people would find it discouraging, but it suggests that the illness you have “in common” is actually two different illnesses, or at least two substantially different presentations. That means it gets us one step closer, a small step but a step even so, to figuring out what is going on with your illness, and maybe getting a cure or treatment for everyone.

If you end up with Treatment A that works for 20% of people with your condition, and Treatment B that works for 50%, and there’s basically no overlap, you’re off to a great start. You can start looking for anything that the Treatment A people have in common that’s never found in the Treatment B group, and vice-versa. If you find something (“holy cow, everyone who liked Treatment A has Irish heritage!”), you can start directing people to try the treatment that’s most likely to work for them. 

Even if you find nothing in common within the groups, you’re still in good shape. There are only two treatments, and we know that Treatment B works for more people. Newcomers can start by trying B, and if that doesn’t work, they can try A next. If neither work, then they are in the other 30% with no discovered treatment. But it’s still progress in general, and you can start putting your efforts towards finding treatments C, D, E, etc. 

It may be tempting to jump ahead and start looking for differences now, before we have treatments that distinguish between various groups, and there is some merit in this idea. If we find that half of people with IBS tend to have bloating with no reflux, and the other half tend to have reflux with no bloating (or whatever), that’s a pretty interesting sign, and will probably end up being useful. 

But this approach doesn’t usually seem to work.[1] Probably this is because clustering by symptoms isn’t useful; or when it is useful, it will already be obvious. Different causes can present with identical symptoms, as we’ve been discussing. But IDENTICAL causes can also sometimes present with DIFFERENT symptoms! There’s no royal road, no way to cut this knot for sure. You just have to be careful. 

The real enemy here is the confusion (lit. fusion together of different things; “(transitive) To mix thoroughly; to confound; to disorder.”). Talking about “having CFS” or “having IBS” is handy, but when it comes to diagnostics, more detail is better. You may be surprised to discover that someone with the same diagnosis as you has almost nothing else in common. And even when you have every symptom in common, don’t confuse this for a common cause. Your friend may also have migraines, but don’t be shocked when the thing that worked for you doesn’t work for her.

Remember that car crashes all have similar presentation. In true diagnostic fashion, they usually show three or more of the following symptoms: broken glass, injured driver(s), skid marks, bent fenders, police on scene, plastic debris on the road, etc. Take two Geico and call me in the morning. 

it’s ok, this lizard is a doctor

If you only did an analysis of symptoms, you might think that all car crashes have the same cause. An analysis of symptoms would suggest just one group. But we know that’s not the case — car crashes can happen for many different reasons, and even car crashes with very different causes will usually have very similar symptoms. 

Maybe if you are a genius detective and you know just what to look for, you can tell them apart — maybe a car crash caused by a seizure will show signs of uncontrolled driving well before the point of impact, while a car crash caused by excessive speed will have longer, straighter skid marks on the blacktop. But you certainly won’t be able to discover the different causes of car crashes by going down a checklist of “was there broken glass?”, “were there skidmarks?”, “were the drivers injured?”, etc.

If you add in criteria like “how long were the skidmarks?” you might get closer. But you’d have to understand the causes well enough to add that question in the first place.  

ENDNOTES:

[1]: If you know of any examples of looking at a disease, looking for patterns in its symptoms, and finding that it is really two diseases (or something similar), we’d be interested to hear about that, since we can’t think of any examples where this approach has worked.

Links for March 2023

The Sixth Stage of Grief Is Retro-computing (h/t Visakan)

ExFatLoss on measurement: a “Regular, Boring Scale” is the best way to measure body weight for weight loss. “All other methods are either less precise, more prone to user error, or too impractical/expensive to do daily. It doesn’t matter which scale you buy. Just get any $20 model from Amazon. No smart/body fat sensing required. All scales are imprecise. Still good enough. Yea, yea, the scale isn’t perfect. Most scales aren’t that accurate. … But everything else I’ve seen is worse. And I’ve tried a lot.”

The Fosbury Flop Changed Athletes’ Bodies – “the coach can impart important principles, but my sense of great coaches is that they also allow — or encourage — some freedom for the learner to experiment and find a personal solution within the bounds of the task. … I think there’s an analogy for coaches or mentors or bosses of any kind: we should all think about how we can use our influence essentially to underwrite smart risk-taking and experimentation, even within the confines of a well-defined goal.”

Most descriptions of sumptuary laws (historical laws that, among other things, limit who can wear what) focus on how these laws keep common people from imitating those of higher status. If we pass a law that only dukes are allowed to wear cloth-of-gold, then rich merchants can’t pretend to be as important as a duke just because they can afford it. But we have this long-standing suspicion that sumptuary laws also kept rich people from pretending to be lower status than they really are. Think of it as a combination of the king wanting to know who is flush enough to tax heavily, and enforcing noblesse oblige, with everyone being able to tell who is fortunate enough to have an especial obligation to the less fortunate. Anyways: You Can’t Even Tell Who’s Rich Anymore

In Turmoil Over Tampons, Scientists See a Need for More Study

Black Women Say Products for Black Hair Are Dangerously Toxic

A forthcoming visual novel “supposedly actually prepares your 2022 US federal tax return through romancing an anime girl.”

Annals of medical ontology: “Ever wondered how many symptoms make up all of adult psychopathology described in the DSM-5? It’s 628! How many do you think repeat across multiple diagnoses? If you answered 231 symptoms repeating a total of 1022 times (median 3 times per symptom, range 2-22), good job!

“more than a third of the diagnoses have *every* symptom repeating in at least one other diagnosis”

Church of Reality: Barbara McClintock on Scientific Mysticism and Plant Consciousness from Gaurav Venkataraman, who describes the piece as, “some notes on McClintock, who is in my view the greatest scientist of all time.”

Are Iranian schoolgirls being poisoned by toxic gas? – BBC News (compare to our recent speculations about gender gaps in chronic illness)

“The spread of cases across the country and the fact it has been predominantly affecting schoolgirls, with fewer boys and adults falling ill, were central to his conclusion, he said. The nature of the symptoms and the fact most patients quickly recovered were also key, he said.”