Previously in this series:
N=1: Single-Subject Research
N=1: Hidden Variables and Superstition
N=1: Why the Gender Gap in Chronic Illness?
N=1: Symptom vs. Syndrome
Peter has a bad reaction to melons. Every time he eats melon, he gets sick right away, and he often throws up.
We can say that Peter’s reaction to melon has low latency. When it happens, it happens right away. No waiting about.
Mark also has a bad reaction to melons. But because of a complex series of biochemical interactions, when Mark eats melon, he doesn’t get sick right away. He gets sick about three days (72 hours) later, when he suddenly starts to feel very ill, and then often throws up.
We can say that Mark’s reaction to melon has high latency. It happens, but it always takes a long time to kick in.
Peter and Mark have basically the same reaction to melon. Both have the same symptoms — nausea, sickness, and vomiting. Both reactions happen for sure every time — they are both equally reliable. The only thing that’s different is the latency.
b. Different and the Same
Though their reactions are nearly identical, Peter and Mark end up with very different experiences of their sensitivity.
Peter quickly learns that melon is a trigger. After all, he gets sick right away. He just makes sure to avoid melon and goes about his life with no additional air of mystery.
Mark, on the other hand, is plagued with random, crippling nausea. He sometimes gets sick, and it always seems to be for no reason. This is because it’s hard to remember what you were eating exactly 72 hours ago (for example, take a moment to try to remember what YOU were eating 72 hours ago). So for Mark, the connection is very obscure. He may never figure it out.
Both of these relationships would become equally obvious in a self-experiment. As long as you were tracking melon consumption and looking for relationships over a long enough time frame, you would see that Peter gets sick right after every dose of melon, and Mark gets sick exactly 72 hours after every dose of melon.
Perfect 100% reliability would make this pretty obvious once you noticed it. You don’t need a huge sample size to pick up on a relationship that is 100% reliable, which is why Peter quits melons after getting sick just a few times.
The big difference is whether the relationship jumps out at you or not. Low-latency relationships are obvious; the close proximity of cause and effect highlights the correct hypothesis and draws immediate attention to the relationship, where it can quickly be confirmed. Peter can just eat more melon and immediately get corroborating evidence if he wants to confirm his theory. The relationship is intuitive; you know it when you see it.
c. Cause and Effect
High-latency relationships are much harder to spot, even if they are equally reliable. The separation of cause and effect means that the connection may never come to mind.
To even be able to pick up on this in a self-experiment, you would have to know in advance that you should be tracking how much melon you are eating. And this is the hard part. The hard part is not demonstrating the relationship. At 100% reliability, that’s easy. The hard part is picking up on what to track.
This is somewhat in contrast to our normal concerns in research. Normally we worry about sample size and the quality of our measures. But Mark doesn’t need a big sample size. He doesn’t need any measures other than “got sick” and “ate melon”. All he needs is to consider melon as a possible cause of his nausea, and to consider looking for relationships with a latency of at least 72 hours. Easier said than done.
d. Reliability in Real-World Relationships
Of course, most real-world relationships are not 100% reliable. Few things work every time. But it’s concerning how a little latency can hide an otherwise blatant relationship, and it makes us wonder how many connections we all miss because of relatively small delays in onset.
Zero latency (eat melon, immediately puke) is easy to figure out. These relationships become obvious after just a few trials.
In comparison, 72-hour latency is very hard to figure out. Most people are not looking for relationships with such a long delay, and even if you were, you would be hard pressed to figure out the cause.
You can’t just keep a food journal and look 72 hours back — you don’t know how long the latency is, so you don’t know how far back to look! And if the latency varies at all (e.g. always between 60-80 hours later), it gets even harder.
This makes us wonder how much latency we can handle before connections stop being obvious. It may not take much. Coffee -> heartburn with an hour delay, that seems pretty doable. We think you would figure that one out pretty quickly. But with a four hour delay? Eight hours? Twelve? This would be much more difficult. It would start to look more like, “heartburn around dinnertime / going to bed, especially on weekdays”. That sounds hard to puzzle out.
Latency also makes it harder to get a big sample size. With a latency of less than 5 minutes, Peter can easily do eight trials (eat some melon and face the consequences) in a single day. Mark can’t do that. He has to wait 72 hours to get the results from his first trial, except it’s worse than that, because he doesn’t know how long he has to wait for the results to come in.
If he wants to make sure not to cross the streams, he needs to devote three whole days (though again, he doesn’t actually know in advance how much time he has to dedicate) to each trial, so he needs 3 * 8 = 24 days to do the same number of “eat melon and find out” trials that Peter can easily do in an afternoon, if he’s willing to get sick that much in a single day.
Jo has a bad reaction to one of the additives in her office’s tiny cups of dairy creamer (henceforth: “creamer”). Every time she uses one of the tiny cups, she gets very tired about 30 minutes later. Fortunately, Jo’s kidneys happen to handle the additive really well, and two hours after she takes the creamer, she has cleared all of the additive out of her system, and stops feeling unusually tired.
We can say that the additive has a short half-life in Jo’s system, and that the symptoms (fatigue) have a short half-life as well. They don’t stick around for long, things quickly go back to baseline.
Lily works in the same office and has the exact same reaction to the same additive in the office’s tiny cups of dairy creamer. Every time she uses one of the tiny cups, she gets very tired about 30 minutes later. But through a random accident of biology, Lily’s body doesn’t clear the additive from her system nearly as quickly as Jo’s does. The additive sticks around for a long time, and Lily keeps feeling tired all week. If she takes some creamer on a Monday, she’s just getting over it on Sunday afternoon.
We can say that the additive has a long half-life in Lily’s system, and that the symptoms (fatigue) have a long half-life as well. They stick around for a long-ass time, and it takes forever for her to feel normal again.
b. Puzzling it Out
Much like a long latency, a long half-life makes this problem much harder to puzzle out, even when the two cases are otherwise identical.
Jo has it easy. If she comes to suspect the creamer, she has a lot of options. She can try taking creamer some mornings and not other mornings. She can try taking the creamer at different times of day and seeing if the fatigue also kicks in at different times. She can even take the creamer multiple times in the same day. Since the symptoms clear out after just two hours, she’s quickly back to baseline and is ready for another trial. If she wants to compare different brands of creamer to see if there’s a difference, she can get a pretty good sample size in a weekend. It’s easy for her to collect lots of data.
Lily has it really hard. If she comes to suspect the creamer, she is in a real bind, and most of the traps are invisible. If she tries taking the creamer some mornings and not other mornings, her results will be a mess, because as soon as she takes it one morning, she is fatigued all week. It will look like the creamer has no effect at all, since on days when she doesn’t take the creamer, she is still fatigued from any creamer she took in any of the previous seven days. A day-by-day self-experiment would show no effect, even though this is totally the wrong conclusion.
To detect any effect, Lily needs to test things in blocks of weeks, instead of blocks of days or hours. Each Monday, either take the creamer or not, and see how tired she is that week. But you can see how hard it would be for her to figure out this design — how is she supposed to know in advance that she needs to study this problem in blocks of a full week? She has a lot less flexibility; you might say that her research situation is much less forgiving.
Even if Lily does pin down the right research design, it still takes her much longer to get the same amount of data. Randomly assigning creamer or no creamer each morning, Jo can get 28 data points in four weeks, which is enough data to detect a strong relationship if there is one. Meanwhile, in four weeks Lily would get only four datapoints, not enough to be at all convincing.
If the relationship is weaker (e.g. only a 50% chance of becoming fatigued), things are even worse. Jo can get a sample size of 100 or 200 days if she has to; it would be a pain, but she could make it happen. But for Lily to get a sample size of 100 weeks would take two years.
c. Thought it Worked for a While 🙂
Lots of people try something, feel like it works great, and then later when they do a more rigorous self-experiment or just keep trying it, they feel that the effect wears off. Must have just been excitement over trying a new thing.
For example, back in early 2020 Scott Alexander put out a report describing his experience with Sleep Support, a new (at the time) product by Nootropics Depot. His sleep quality isn’t great, so he decided to give this new supplement a shot, and reported miraculous results:
The first night I took it, I woke up naturally at 9 the next morning, with no desire to go back to sleep. This has never happened before. It shocked me. And the next morning, the same thing happened. I started recommending the supplement to all my friends, some of whom also reported good results.
“I decided the next step was to do a randomized controlled trial,” he says. To make a long story short, the RCT found no difference at all in any measure of sleep quality. “My conclusion is that the effect I thought that I observed – a consistent change of two hours in my otherwise stable wake-up time – wasn’t real. This shocked me. What’s going on?”
Scott chalks this up to the placebo effect, which is certainly possible. But another possibility is that Sleep Support did work great at first but was no longer detectable (for whatever reason) by the time he set up the RCT. Obviously if this is true, it would be hard to study; but it does perfectly match Scott’s experience, which is otherwise (as he says) shocking and somewhat confusing.
If you have any experience with chronic illness or biohacking or anything similar, then you know that “thought it worked for a while” is a very common story. When this happens, the assumption is usually that you were fooling yourself the first time around. But consider:
Vitamin C cures scurvy, so if you have scurvy, the first few doses of vitamin C are great! But after that, vitamin C has basically no effect, because you no longer have scurvy. You have been cured. Looking at this data (huge increases in wellbeing on the first few days, but after that, nothing), the research team concludes that the original reports were somehow mistaken.
No! It’s just that the vitamin C helped and then it had done all it could! It had a huge effect! That effect was just all up front!
This exact scenario should pop up all over the place. If you are iron deficient, the first few doses of iron will have some effect. After that, they will have no effect. If you are B12 deficient, the first few doses of B12 will have some effect. After that, they will have no effect. Et cetera.
This is because the body is able to keep reserves of all of these substances. As long as you’ve been getting enough vitamin C, you can go for 4 weeks without any vitamin C at all before you start getting scurvy (in reality it usually takes more like 3 months, because most people don’t go entirely cold turkey on vitamin C). Same goes for iron and B12 — your body is able to keep reserves of these substances, so as long as you get enough, you should be set for a while.
To put this back in the terms of this essay, we would say that these positive effects have a long half-life. Positive effects with a long-half life face exactly the same issues as negative effects with a long-half life — you have to make sure you take the half-life into account when designing a study, and use long enough study periods, otherwise your data will be confused and misleading.
This same point applies to a lot of treatments, actually. Assuming you have an infection, antibiotics will show a big effect up front and then nothing after that. But we don’t take this to mean that antibiotics have no effect, oops we thought it worked for a while, guess we were wrong.
This isn’t a problem for things with no reservoir. For example, as far as we can gather, zinc isn’t really stored in the body long-term. So most effects of zinc will (probably) have a short half-life. If you need more zinc, you can just take it on a given day and see the effects.
Supplementing anything with a large reservoir (or other positive effect with a long half-life) may not be suitable for a self-experiment, because it will show a strong effect in the first few days and no effect after that. Aggregated over 30 days or whatever, this will look like no effect or a weak effect. Clearly this is the wrong interpretation.
And the longer you run the self-experiment for, the smaller the effect will appear! If you do a 10-day self-experiment with antibiotics, and they have an effect on the first two days, then you will find that this looks like 2/10 days show an effect, which will probably average out to a small effect. But if you kept going for 100 days, you would see that 2/100 days show an effect, which will average out to basically no effect at all.
This is the opposite of our normal assumption about sample sizes, that a larger sample size will always get us a more meaningful, accurate estimate. This assumption simply isn’t true if we’re dealing with a treatment that has a long half-life.
So consider the half-life of positive effects too.
Broadly speaking, triggers have some delay in the onset of their symptoms, and those symptoms stick around for some span of time.
Having a high latency or a long half-life makes a relationship much harder to notice, and harder to study. Having both, it gets even worse.
Perhaps Bob is allergic to dairy, or whatever. It gives him hives, but with a latency of two days, and they persist for four days. Bob will be walking around with random hives, and not much hope of finding out why.
He might come to suspect the true cause if he happens to cut out dairy for a while and the hives go away for good. But if someone challenged him on this — or if Bob, being a good scientist, decided he wanted to run a self-experiment to demonstrate the hive-causing effect — he would be hard pressed to get convincing formal evidence.
Bob wouldn’t know in advance to look for a latency of two days and persistence of four days. If he did something reasonable, like randomly assign each day as dairy or non-dairy, the results would look like zero effect. On most days when he took no dairy, he would have hives anyways, because of the long half-life. On most days when he did take dairy, he would also have hives, because they stick around so long. The few “no hive” days would be in the random periods where he hadn’t had any dairy several days ago; but those days might well be days when he was assigned to drink dairy. So it would look like a wash, even though it’s actually a very reliable relationship.
Bob would have to do something that seems totally unreasonable, like structure the trial in 6-day segments to account for these delays. If he did this right, the 2-day wait and 4-day stay would become entirely obvious. But how is he supposed to know in advance that he has to use this totally weird study design?