This is one of the most similar proposals to the theory presented in A Chemical Hunger, though they don’t go quite as far as we do, still attributing some of the influence to diet and exercise: “Most reports attribute the obesity epidemic to factors such as excess food energy intake, changes in diet and eating behavior, and increasing sedentary life style. Undoubtedly, these factors contribute, but can they all account for the rapid increase in this problem that occurred over the last two decades?”
There’s even a study where they put fecal matter from human twins into germfree mice. (This is one of the more creative study designs we’ve seen.) They started by finding pairs of twins where one twin was fat and the other twin was lean. This is pretty uncommon — normally, twins weigh the same amount. They transplanted fecal matter from the twins into mice and found that mice that got fecal matter from the obese twin gained weight — unless it was housed with one of the mice who got fecal matter from the lean twin.
However, there is also evidence against this picture. For one thing, Germany, Spain, Italy, and Japan all use a lot of antibiotics in their meat, and none of these countries is particularly obese. Australia and South Africa are both pretty obese, but both of these countries use less antibiotics than usual. This could maybe be reconciled if these countries use different kinds of antibiotics, but we would need to see that case made to evaluate it.
There’s also some evidence in favor of this theory that this paper didn’t review.
For one thing, people who eat fewer animal products have lower BMIs, and the effect seems to be dose-dependent. In a sample from 2002-2006, average BMI was lowest in vegans (23.6) and incrementally higher in ovo-lacto vegetarians (25.7), pescitarians (26.3), semi-vegetarians (27.3), and nonvegetarians (28.8). We can note that the BMI for vegans is about the same as that found in hunter-gatherers and in Civil War veterans in the 1890s. That said, everyone in this sample was a Seventh-Day Adventist, so they may not be all that representative.
India and Japan are the least obese of the developed countries. Both have obesity rates below 5%. India is the most vegetarian country on the planet and Japan, while not especially vegetarian, mostly consumes seafood in place of meat products.
This would mean that vegan diets would work really well for weight loss, right? Well, maybe. As we previously reviewed, all diets seem to work a little, and no diet seems to work all that well. We see something similar in vegetarian and vegan diets. A 2015 meta-analysis found that people assigned to vegetarian diets lost more weight than those assigned to nonvegetarian diets. People on vegan diets lost a little more weight than people on vegetarian diets, about 5.5 pounds (2.5 kg) to 3.3 pounds (1.5 kg). The studies differed quite a bit in the size of the effect, but all of them had similar conclusions. The other meta-analysis from 2015 found the same general pattern, and individual studies comparing different types of vegetarian and vegan diets seem to confirm this dose-dependent trend.
This looks a lot like other studies, where the differences between diets are technically reliable but so small as to be basically meaningless, but the possible dose-dependent effect is interesting.
The most interesting study might be this one, that compared a vegan diet to a conventional low-fat diet. So far so standard, but unlike most diet studies, which end after 12 or 18 months, this one followed up two years later. The vegan group not only lost more weight (4.9 kg versus 1.8 kg), they kept it off better at the two-year followup (3.1 kg versus 0.8 kg). On most diets people lose a little weight but then gain it right back, so the fact that people kept most of the weight off for two years is interesting. Even so, the amount of weight lost in an absolute sense is still quite small. It could take more than two years on a vegan diet for you to see all the effects — but if this were the case, you’d think people would have lost even more weight by year two, but that’s not what we see.
None of these are smoking guns. At best, they are consistent with the idea that some of these contaminants are more prevalent in animal-based foods. And we know that this can’t be about the animal products themselves, because hunter-gatherers and our ancestors in 1890 ate lots of meat and didn’t experience modern levels of obesity.
Environmental contaminants tend to build up in animals through the plants they eat, so any contaminants in the environment will bioaccumulate, and concentrations will be higher in animals than in groundwater or in plants. Compounds in a farmer’s fields will end up in the corn or alfalfa fed to their cows, and the cows will end up getting an even larger dose, which will be passed on to the person who eats the resulting cheeseburger. So the fact that meat consumption is linked to obesity doesn’t necessarily implicate antibiotics. It could be something else in the meat.
Evidence strongly suggests that the obesity epidemic is the result of environmental contaminants.
However, it’s not entirely clear exactly which contaminants are responsible.
If we’re lucky, a few compounds are entirely responsible for the increase in obesity over the past forty years, and we can ban those chemicals.
If we’re not lucky, the obesity epidemic is the result of dozens or even hundreds of different contaminants, each with a small effect, which when combined lead to extreme obesity. In this case we can try to ban or regulate them all, but it will be much more diﬃcult to find ways to get all of them out of our food, water, and homes.
While more work will be needed to pin down exactly what contaminants are responsible, we can make some very educated guesses, because we already know what kind of contaminants we’re looking for.
4.1 Invention and Introduction
The big inflection point for the obesity epidemic was around 1980, so we should be looking for compounds that entered the environment slightly before then. Either they were discovered around 1960-1970 and were immediately introduced, or they were discovered some time before and went into widespread use just before 1980.
These contaminants may be synthetic, but they don’t have to be. These could also be naturally occurring compounds that are introduced into the environment through human activity.
We’re aware that correlation doesn’t imply causation. By itself, showing that the time course of some contaminant is related to the upward trend in obesity rates isn’t enough to show that the contaminant is responsible for the obesity epidemic. But it is suspicious. We should keep in mind that the link between smoking and lung cancer was largely established through correlational data.
If we don’t see a relationship between a contaminant and the obesity epidemic, it’s harder to make the case for that particular contaminant. For example, some people have proposed that certain milk proteins might be behind the obesity epidemic. But if you look at dairy consumption by country, you immediately see that some of the leanest countries are near the top and many of the most obese countries are near the bottom. As a result, we don’t think this is a serious candidate and we don’t discuss it in this paper. A relationship is one of the first signs we should look for in a proposed explanation, even though, by itself, it’s not enough to be convincing.
4.2 Between-Group Differences
There are large differences in rates of obesity between counties, states, countries, and even professions.
Some of this is due to differences in factors like altitude or genetics. For example, in a group of about 43,500 patients from the San Francisco bay area, the rate of obesity in European-Americans was about 26%, but the rate of obesity among Asian-Americans was much lower, at 12% obese. This also differed quite a bit by country of origin. Among Asian-Americans, Filipino-Americans (24%) and Indian-Americans (17%) were the most obese, and Chinese-Americans (7%) and Vietnamese-Americans (6%) were the least obese.
This suggests that even if these environmental contaminants were just as widespread in China as they are in the United States, the rate of obesity would only be about 7%. In this light, it’s not surprising that the rate of obesity in China is currently around 6%. That isn’t a mystery that we need to explain, it’s about what we would expect based on what we know about the genetics of obesity.
Despite this, some of the difference in obesity rates between countries is probably due to differences in exposure to contaminants. This is something we should keep an eye out for. In contrast to China, Indian-Americans are about 17% obese, less obese than European-Americans but much more obese than people living in India, who are only about 4% obese.
4.3 Dose Dependence
An obvious smoking gun would be if the amount of exposure to a contaminant were related to obesity. If people who are exposed to a high dose of the contaminant are fatter than those who are exposed to a low dose, that would be a strong indication that the contaminant is responsible.
We are all living in a fattening environment. A 2012 meta-analysis of 115 studies concluded that around 75% of individual difference in BMI is genetic. This is entirely compatible with the idea that contaminants are responsible for the difference between obesity rates in 1970 and 2020. It just means that, now that everyone has been exposed to more or less the same contaminants, 75% of the variation in the population is genetic.
That only leaves 25% of the variance to potentially be explained. If we assume that all of that variance is due to differences in dose, then some simple math tells us that the correlation with dose would be r = 0.50. This is a reasonably large correlation, but we also shouldn’t expect things to be so simple. If we expect that there are ten contaminants, the correlation between dose and BMI for each would be about r = 0.16. If dose explains only 15% of the remaining variance rather than 25% (leaving a reasonable 10% as the result of noise), the correlation for each of the contaminants would be r = .12.
With a large enough sample size, this would certainly be detectable. But dose alone almost certainly can’t explain all the remaining variance. We should be aware that even if there is a strong dose-dependent effect, the effect size might appear statistically to be quite small.
At some point in the past, these contaminants weren’t in the environment at all. Now, they’re so widespread that almost everyone in the industrialized world is getting a dose. This means we’re working with a restricted range. No one, or almost no one, has a dose near zero. Most of us are probably getting similar doses — that’s part of what it means when we see that 75% of the variation in the current population is genetic.
When the range of a variable is restricted, the correlation always ends up looking smaller than it really is. That link leads to a paper where they show that, for a dataset with a true correlation of r = .82, with different range restrictions the apparent correlation can be .51, .47, or even as small as .18! In some cases, a restricted range can even make a positive correlation appear to be negative.
We know that in most samples, everyone will have similar levels of exposure to the contaminants. We’ll be working with a restricted range, and any correlation we see will be smaller than the true correlation. It’s not clear how restricted our range is, but the correlation we see may be much smaller than the true relationship. It might disappear altogether, or even appear slightly negative. And remember, if genetics is 75% of the variance, then the largest dose-dependent correlation we can expect to see is only .50, which is not all that large to begin with.
(This is a known issue in studying public health. E.g. Lindeberg: “Another diﬃculty is that the variation of dietary habits in the population being studied may be too small to allow for demonstration of a possible relationship with health. Salt consumption among a particular ethnic group may not show much variation, and the majority has an intake that is much higher than what was practically feasible during evolution. This problem can be compared to studying the importance of smoking for myocardial infarction without having access to non-smokers. (In the case of smoking, often no relation is apparent in epidemiological studies.)” )
One pattern you might expect to find is that dose-dependent effects only become apparent in samples with a wide range of dosages — that is, in cases where the range isn’t restricted. They might be especially pronounced in groups that have extremely high levels of exposure, such as people who work with these contaminants directly, because that increases the range of dosages.
There might also be diminishing returns. Let’s imagine that today, people on average get a dose of 100 units. Back in 1970, everyone got a dose of 0 units. Now, the first 20 units might be very fattening indeed. But in general, the human body can only get so fat. The first 20 units might make you gain 20lbs on average. But the next 20 units only make you gain 10lbs. And the next 20 units make you gain only 5lbs. Now that everyone is at 100 units, every additional unit of exposure leads to a nearly undetectable change in body mass.
We don’t know that there is diminishing weight gain from greater doses of these contaminants, but diminishing returns are pretty common in pharmacology (see for example here). If there are diminishing returns, we may be near or past the ceiling effect, in which case we might not be able to detect a dose-dependent effect. If this were the case, however, we might expect to see dose-dependent effects in samples with lower average doses; perhaps samples from the 1980s or 1990s, or samples from developing countries which don’t yet have doses in the same range as industrialized countries.
A further complication is that being exposed to contaminants doesn’t make you gain weight that very same day. Even on Olanzapine, which makes people gain an average of 13.7 kg after 48 weeks, you generally don’t see your first kilogram of weight gain until after 8 weeks. The dose in your system today will be less correlated with your weight than the dose you were on 6 months ago. The dose will be a lagging indicator, and this will also reduce any correlation.
This is further complicated by the fact that these compounds might have paradoxical reactions, which is what we call it when a drug sometimes has the opposite of its normal effect. This would cause some portion of people to actually lose weight, and would further reduce the apparent correlation between the contaminant and obesity.
Finally, there are a priori reasons for us not to expect there to be strong correlations in the existing literature. If there were a compound or contaminant that was correlated with BMI, even a relatively small correlation like r = 0.20, someone probably would have noticed. This means that either 1) the contaminants are compounds that we don’t usually measure, so no dataset exists where we can compare them to measures of obesity, or 2) the relevant contaminants are commonly measured, but for statistical reasons like the ones above, there isn’t an obvious correlation with obesity.
Evidence of a dose-dependent relationship would be a smoking gun in favor of a contaminant being one of the causes of the obesity epidemic. But the lack of a dose-dependent relationship isn’t evidence against the contaminant being involved.
4.4 Environmental Interactions
In the following sections, we do our best to identify which of the contaminants we’re putting into the environment could be the cause of the obesity epidemic, and we believe that we have found some likely candidates. Unfortunately, this search is complicated by the fact that chemistry and biology allow for bewildering interactions, and sometimes seem to be working against us.
There are two general ways this can happen. The first is that when chemical contaminants end up in the environment, they can be transformed into different compounds. This can occur as a result of interactions with minerals in the groundwater, from exposure to sunlight, from exposure to radioactivity, or from chemical interactions with other contaminants. Since contaminants can sit in soil and groundwater for decades, there’s a lot of time for these transformations to happen.
In her book Silent Spring, Rachel Carson describes how contaminants “pass mysteriously by underground streams until they emerge and, through the alchemy of air and sunlight, combine into new forms that kill vegetation, sicken cattle, and work unknown harm on those who drink from once pure wells.”
She provides a few illustrative examples. One case involved a manufacturing plant in Colorado. In 1943, the Rocky Mountain Arsenal of the Army Chemical Corps began to use the plant, located near Denver, to manufacture war materials. After eight years, the same manufacturing plant used to make these war materials was leased to a private oil company for the production of insecticides. Even before this point, however, there were already reports from miles away of mysterious sickness in livestock, crops dying and turning yellow, and even human illness, possibly related. A thorough investigation eventually revealed that the groundwater between the arsenal and the farms had become contaminated, but it had propagated so slowly that it took several years for the contamination to reach the farmland.
Analysis of the farms’ shallow wells revealed contamination with arsenic, chlorides, and other dangerous substances. This was enough to explain the majority of the reports of illness and crop damage. But most mysterious was the discovery of the weed killer 2,4-D in some of the wells:
Certainly its presence was enough to account for the damage to crops irrigated with this water. But the mystery lay in the fact that no 2,4-D had been manufactured at the arsenal at any stage of its operations. After long and careful study, the chemists at the plant concluded that the 2,4-D had been formed spontaneously in the open basins. It had been formed there from other substances discharged from the arsenal; in the presence of air, water, and sunlight, and quite without the intervention of human chemists, the holding ponds had become chemical laboratories for the production of a new chemical—a chemical fatally damaging to much of the plant life it touched. And so the story of the Colorado farms and their damaged crops assumes a significance that transcends its local importance. What other parallels may there be, not only in Colorado but wherever chemical pollution finds its way into public waters? In lakes and streams everywhere, in the presence of catalyzing air and sunlight, what dangerous substances may be born of parent chemicals labeled ‘harmless’?
While we can try to identify the contaminants that cause obesity, the disturbing fact is that the contaminants responsible may be compounds which we are unfamiliar with, because they weren’t created in a lab and have never been examined for safety. Again, Rachel Carson puts it better than we could:
Indeed one of the most alarming aspects of the chemical pollution of water is the fact that here—in river or lake or reservoir, or for that matter in the glass of water served at your dinner table—are mingled chemicals that no responsible chemist would think of combining in his laboratory. The possible interactions between these freely mixed chemicals are deeply disturbing to officials of the United States Public Health Service, who have expressed the fear that the production of harmful substances from comparatively innocuous chemicals may be taking place on quite a wide scale. The reactions may be between two or more chemicals, or between chemicals and the radioactive wastes that are being discharged into our rivers in ever-increasing volume. Under the impact of ionizing radiation some rearrangement of atoms could easily occur, changing the nature of the chemicals in a way that is not only unpredictable but beyond control.
To make matters worse, something quite similar can happen inside our bodies. As surprising and chaotic as the interactions between contaminants can be, their interactions with human biochemistry can be even more complicated.
“A human being,” writes Carson, “unlike a laboratory animal living under rigidly controlled conditions, is never exposed to one chemical alone. Between the major groups of insecticides, and between them and other chemicals, there are interactions that have serious potentials. Whether released into soil or water or a man’s blood, these unrelated chemicals do not remain segregated; there are mysterious and unseen changes by which one alters the power of another for harm.” She goes on to describe several such interactions in gory detail.
The organic phosphates, “those poisoners of the nerve-protective enzyme cholinesterase,” become much more dangerous if a person has previously been exposed to chlorinated hydrocarbons that injure the liver. Pairs of different organic phosphates themselves can also interact with each other, “in such a way as to increase their toxicity a hundredfold.” Organic phosphates also have the potential to interact with all sorts of other things in the environment, including prescription drugs, synthetic materials, and food additives.
Similarly, a person exposed to DDT is much worse off if they have already been exposed to another hydrocarbon that causes liver damage — “so widely used as solvents, paint removers, de-greasing agents, dry-cleaning fluids, and anesthetics.” As a result, a dose of DDT that is survivable for one person may be devastating to someone else. “The effect of a chemical of supposedly innocuous nature can be drastically changed by the action of another,” Carson tells us. “One of the best examples is a close relative of DDT called methoxychlor”:
Because it doesn’t accumulate in the body to any great extent when given alone, we are told that methoxychlor is a safe chemical. But this is not necessarily true. If the liver has been damaged by another agent, methoxychlor is stored in the body at 100 times its normal rate, and will then imitate the effects of DDT with long-lasting effects on the nervous system. Yet the liver damage that brings this about might be so slight as to pass unnoticed. It might have been the result of any of a number of commonplace situations—using another insecticide, using a cleaning fluid containing carbon tetrachloride, or taking one of the so-called tranquilizing drugs, a number (but not all) of which are chlorinated hydrocarbons and possess power to damage the liver.
Another good example is malathion, an insecticide that at the time was commonly used by gardeners. The name of this product sounds so evil that we’re surprised it passed the review of the corporate public relations people, but apparently the name comes from the smell and means “bad sulphur”, so there you go. Malathion is extremely deadly to insects but is “safe” for mammals, including humans. But malathion is only “safe” because the mammalian liver detoxifies it with an enzyme, rendering it harmless. “If, however, something destroys this enzyme or interferes with its action,” we are warned, “the person exposed to malathion receives the full force of the poison. Unfortunately for all of us, opportunities for this sort of thing to happen are legion.”
In particular, Carson relates the story of how a team from the FDA found that when malathion was administered at the same time as some of the other organic phosphates, “a massive poisoning results—up to 50 times as severe as would be predicted on the basis of adding together the toxicities of the two.” This led them to test the combination of many different organic phosphates, and found that many pairs of these compounds are exceedingly dangerous in combination.
The reason for this appears to be “potentiation” of their combined action — when one of the compounds destroys the liver enzyme responsible for detoxifying the other. “The two need not be given simultaneously,” we are warned. “The hazard exists not only for the man who may spray this week with one insecticide and next week with another; it exists also for the consumer of sprayed products. The common salad bowl may easily present a combination of organic phosphate insecticides. Residues well within the legally permissible limits may interact. The full scope of the dangerous interaction of chemicals is as yet little known, but disturbing findings now come regularly from scientific laboratories.”
Carson relates several more examples in a similar vein. Malathion also appears to become much more dangerous when a person is exposed to certain plasticizing agents. Just like its combination with other organic phosphates, “this is because it inhibits the liver enzyme that normally would ‘draw the teeth’ of the poisonous insecticide.” Similarly, exposure to malathion seems to increase the effect of certain prescription drugs, including muscle relaxants and barbiturates.
In addition, we found that malathion can, under some conditions, transform into malaoxon, which is 61x more deadly. One of these conditions is when malathion is exposed to chlorine, as it might be in some drinking water.
Our concern in this paper isn’t the toxicity of different insecticides, of course. The point is that the contaminants that cause obesity may not have a straightforward profile. It’s possible that two (or more) well-known and relatively safe contaminants combine in groundwater to form an unknown new contaminant that causes weight gain, and a host of other problems. It’s possible that a single contaminant becomes something else entirely when it is exposed to sunlight in fields, ponds, and rivers. It’s possible that there are two contaminants, neither of which cause obesity in isolation, but which in combination overwhelm the body.
4.5 Three Possible Contaminants
In the next parts, we propose some contaminants that we think might be responsible for the obesity epidemic. But we should make it clear upfront that the theory itself doesn’t hinge on these compounds. Even if it turns out that none of these compounds could possibly be responsible for the modern rise in obesity, we still think that the evidence is very strong that environmental contaminants are responsible.
We take a close look at three contaminants, and examine the evidence for each.
Runner up titles: Son of CICO, 2 CI 2 CO, Revenge of CICO, CICO 2: Judgement Day, CICO Returns, CICO! Here We Go Again, CICO’s Bogus Journey
Calories In Calories Out (CICO) gave us more trouble than any other section we worked on when writing A Chemical Hunger.
Part of the problem is that CICO means different things to different people, and covers a number of loosely related hypotheses. We found it hard to disentangle these when we were writing Part II. A number of times we circled back to our section on CICO and tried to reorganize it, or re-write it entirely, but we weren’t able to figure out a way to clarify the argument to our satisfaction.
But feedback on the posts has proved extremely helpful, and we think we can now do a better job explaining what we meant. Special thanks to commenters-on-the-blog Richard Meadows and Grat Ivar, to fellow bloggers Alvaro de Menard and Stephen Malina, and commenters on MetaFilter and Hacker News, for helping us clarify our thoughts on this.
One way to interpret CICO (or one sub-hypothesis) is that it claims there is a strictly linear relationship between calories eaten/burned and weight change. This is specified if we take “weight gain = calories in – calories out” literally. Essentially, this hypothesis says that overeating by the same amount should always lead to the same amount of weight gain.
This is clearly false. The overfeeding studies provide extremely strong evidence against this version of CICO, since people gain very different amounts when overfed by the same amount, the difference appears to be mostly genetic, and some people actually lose weight, even when overfed by moderate (1000 kcal/day) amounts. Many people still believe something like “for every extra 3500 calories you eat you always gain one pound”, but all available evidence comes down very strongly against that.
At low levels of overfeeding, people at normal weight often don’t gain any weight at all. We liked how sdenton4 on Hacker News put it: “It’s pretty obvious the body has other ways to dump excess calories than turning them into fat stores.”
In Part II, we quote Stephan Guyenet as saying, “This model [CICO] seems to exist mostly to make lean people feel smug, since it attributes their leanness entirely to wise voluntary decisions and a strong character. I think at this point, few people in the research world believe the CICO model.”
What does Guyenet mean here? A common interpretation tied up in CICO is that differences in willpower explain the difference between obese and lean people. The idea is that weight gain is easy and weight loss is hard for everyone. This interpretation says something like, everyone would be 300lbs if they didn’t use their willpower to eat healthy foods rather than cake — you have to control yourself. From this perspective, people who are obese lack willpower and people who are thin/fit are virtuous resisters of temptation.
The overfeeding studies also provide strong evidence against this hypothesis, since they find that it is hard for most people to gain weight and easy for them to go back exactly to the weight they were before the overfeeding. We think this leaves “willpower” explanations dead in the water. Most skinny people have no trouble staying that way.
While not everyone means this when they say “CICO”, many people do, and we wanted to address this aspect of CICO because it is a common misconception. There’s also a strong moral reason to argue against this aspect of the hypothesis, because the idea that obesity is the result of lapses or weakness in willpower has been used to justify many cruel and ineffective positions. People who believe that obesity is the result of laziness and weak willpower believe that people with no moral fiber can be recognized on sight. As a result, they do things like treat overweight and obese people with disrespect, make jokes about them, don’t hire them, don’t give them proper medical treatment, etc. They think that shaming and social stigma are effective interventions against obesity. Some think that overweight and obese people should feel ashamed of their weight. This is as horrible as thinking that cancer patients should feel ashamed and responsible for falling sick.
In addition, placing the blame on willpower and moral failing, and treating individual responsibility as the appropriate intervention, means abandoning the search for the causes of obesity. If you think obesity is largely the result of willpower, then there’s no mystery or need for a solution. But in fact obesity is not the result of willpower, and obesity remains very mysterious. Let’s quote that article from The Lancet again: “unlike other major causes of preventable death and disability, such as tobacco use, injuries, and infectious diseases, there are no exemplar populations in which the obesity epidemic has been reversed by public health measures.”
Calories Lead to Weight Gain
Another interpretation is something like “calories matter for weight gain”. Other things being equal, you generally gain weight when you eat more calories and you generally lose weight when you eat fewer calories.
We are not trying to argue against this at all! If you eat 400 kcal/day, you will lose weight (there are studies on this). If you eat 10,000 kcal/day, you will gain weight (there are studies on this one too). But the amount of calories you eat matters much less than most people think, and there isn’t a strong linear relationship between calories consumed and weight gained (see above).
You can lose weight by consistently eating at a calorie deficit, but people who do this either struggle to maintain their lower weight or gain it back. For two people who are the same height, one might weigh 150 lbs and the other weigh 200 lbs. If the 200 lb person loses 50 lbs, it will be hard for them to maintain a weight of 150 lbs, but easy for the person who weighed 150 lbs to begin with. Why is that? And why has the number of people struggling with their weight grown so dramatically since 1980?
There may be some people with exceptional willpower and incredible support systems that can lose lots of weight and keep it off — in the same way that stage magicians with incredible willpower and lots of training can learn to hold their breath for 10+ minutes underwater. The question is why they would need to use their incredible willpower in the first place! Maybe willpower can be a solution, but it’s not the cause — it doesn’t explain the differences between 1980 and today.
What about those Calorie Intake Numbers?
A bunch of readers zeroed in on this paragraph:
It’s true that people eat more calories today than they did in the 1960s and 70s, but the diﬀerence is quite small. Sources have a surprisingly hard time agreeing on just how much more we eat than our grandparents did, but all of them agree that it’s not much. Pew says calorie intake in the US increased from 2,025 calories per day in 1970 to about 2,481 calories per day in 2010. The USDA Economic Research Service estimates that calorie intake in the US increased from 2,016 calories per day in 1970 to about 2,390 calories per day in 2014. Neither of these are jaw-dropping increases.
People pointed out that this is about a 20% increase in calories, which doesn’t seem to match our description of the increase as “quite small”. So let’s take a second to unpack our reasoning.
First of all, this was a smaller increase than we expected. We know that even a small daily change adds up over time, but we expected the change in daily calorie intake since 1980 to be much larger.
Partly this is because once you gain weight, you burn more calories (because it takes more energy to move and maintain physiological function) and you need to eat more calories to maintain your weight. Studies show that people with obesity eat and expend more calories than lean people. From this study, for example, consider this sentence: “TDEE was 2404±95 kcal per day in lean and 3244±48 kcal per day in Class III obese individuals.” From this perspective, the average daily consumption per Pew being 2,481 calories per day doesn’t seem like much — that’s about what lean people expend daily. Obese individuals generally burn 3000+ kcal/day, and while not every modern person is obese, it does make the increase from 2,025 calories per day in 1970 to about 2,481 calories per day in 2010 look relatively small.
This is something of a chicken and the egg problem. We could weigh more because we eat more calories, but we could also eat more calories because we weigh more, and we need to eat more calories to continue functioning at that weight. You may say, “but if you eat less at a higher weight you will dip into fat stores to make up the difference”. Not if your set point is too high you won’t! If you eat less than you need and your body wants to defend your current weight, you will crave food and feel tired and stupid. Sound familiar?
In addition, there’s some evidence that back around 1900 people ate more than they did in 1960, without much more exercise, and they weren’t obese — though take this one with a grain of salt, since for the obvious reasons, we don’t have great calorie data from back then. This is mentioned in the paragraph immediately following the one quoted above:
If we go back further, the story actually becomes even more interesting. Based on estimates from nutrient availability data, Americans actually ate more calories in 1909 than they did in 1960.
For the sake of argument, let’s accept that an increase of 400 kcal/day is a meaningful amount. Let’s further assume that the 400 kcal/day increase is entirely responsible for the increase in the rate of obesity since 1980. Even if the increase is meaningful and driving the obesity epidemic, we have to ask, why are people eating more now than they used to?
Just because calories are causally involved doesn’t mean they’re the “first cause”, the cause we should be looking for. Imagine there were an evil dictator that went around force-feeding people. These people would gain weight (per the overfeeding studies) and in some sense the calories would cause the weight gain. But if we’re being reasonable, the insane dictator is the cause, and the calories are just the mediator.
We know people do eat a bit more today than in the 1970s. So the question is something like, in this case, what is the insane dictator? When we say “not CICO” part of what we mean is “not willpower”.
Another point worth mentioning is that there seem to be different kinds of weight change.
One is the kind of weight change that happens around your set point. For most people, changing your weight by 5-10 pounds in either direction will be relatively easy. This may be possible through diet and exercise; the amount of weight change may even have a close linear relationship with the amount of calories you add or remove from your diet! In this limited sense, CICO may be a useful guideline.
This is very different from major changes in fat mass! All available evidence suggests that it is very, very hard to lose more than 10 or so pounds and keep it off, probably because it involves fighting your lipostat’s setpoint.
Another kind of weight change has to do with changes in muscle mass. Altering your body composition to have lower fat/higher muscle percentages can occur without changing your set point and result in significant visual differences. We suspect that weight gain and weight loss work differently in bodybuilding because that involves weight change driven by increases in muscle mass, not fat mass. We’re perfectly willing to believe that people can gain and lose muscle mass in a reliable way based in part on caloric consumption, but that’s not the focus of A Chemical Hunger.
Cutting fat gained when increasing muscle mass might also be relatively easy, because that’s more fat than you “naturally” had before the gains. More body fat should lead to more leptin, but your body had you at the leptin level it wanted you at before, so the leptin-based part of the lipostat will help you lose that fat mass. But changing fat mass alone seems pretty dang hard.
Only one theory can account for all of the available evidence: the obesity epidemic is caused by one or more environmental contaminants, compounds in our water, food, air, at our jobs and in our homes, that change how our bodies regulate weight.
These contaminants are the only cause of the obesity epidemic, and the worldwide increase in obesity rates since 1980 is entirely attributable to their effects. For any two people in a group, the difference between their weights is largely genetic, because everyone is exposed to similar levels of contamination. But the difference between the average weight in 1980 and the average weight today is the result of environmental contaminants.
3.1 Weight Gain in Response to Medication
We know that this is biologically plausible because there are many compounds that reliably cause people to gain weight, sometimes a lot of weight.
And whatever it is, there is more of it every year
It doesn’t affect people living nonindustrialized lives, regardless of diet
But it does affect lab animals, wild animals, and animals living in zoos
It has something to do with palatable human snackfoods, unrelated to nutritional value
It differs in its intensity by altitude for some reason
And it appears to have nothing to do with our diets
Environmental contamination by artificial, human-synthesized compounds fits this picture very well, and no other account does.
Mystery 1: The Obesity Epidemic
People were skinny before the modern era because these contaminants didn’t exist back then.
People’s diets were “worse” in the past — full of lard and bread — because diet doesn’t cause obesity. The ~1% of people who were obese in the past were people with one of the various medical conditions known to cause obesity, such as Prader-Willi Syndrome, hypothyroidism, or hypothalamic lesions.
Mystery 2: An Abrupt Shift
People rapidly started getting more and more obese starting around 1980 because the contaminants are the product or byproduct of some industrial process. We’re looking for compounds that were invented around 1960 or 1970, because it would probably take a few years for enough to get into the environment to start affecting us.
Alternatively, these might be compounds that had been invented much earlier, but only began to see widespread deployment around 1980. Either way, we’re looking for that abrupt shift.
Mystery 3: The Ongoing Crisis
The obesity epidemic keeps getting worse because these contaminants continue to be produced and continue to build up in the environment. Every year they accumulate and each of us gets a larger dose. This suggests that we are looking for compounds that don’t break down easily, or at least are being introduced into the environment faster than they break down.
Mystery 4: Hunter-Gatherers
Different groups of hunter-gatherers remain lean while eating very different diets because the human body can thrive on many kinds of food. Some of the diets are extremely high-fat. Some of them are extremely high-starch. Some of them are extremely high-sugar. Some eat an extremely varied diet, while others get almost half of their calories from a single food source. But they don’t become obese, because they’re eating fat right off the gemsbok or yams straight out of the ground, and living in grass huts.
It’s true that none of the Kitavans living on the island were at all overweight. But there were actually two overweight Kitavans — both men who had grown up in Kitava, had since moved away for many years, and who happened to be visiting at the time of the study. Lindeberg managed to examine one of them, a 44-year-old businessman named Yutala, who had left the island fifteen years earlier to become a businessman in Papua New Guinea. At the time of the study, Yutala was almost fifty pounds heavier than the average Kitavan man of his height, twelve pounds heavier than the next heaviest man, and had the highest blood pressure of any Kitavan Lindeberg examined.
When he moved away from the island, Yutala was exposed to a modern way of living. More importantly, he was exposed to the contaminants of an industrialized society. As a result, he became overweight.
Yutala isn’t an isolated case. In fact, this happens with some regularity. Lindeberg notes, “an epidemic of obesity and weight gain has occurred in former traditional populations that transitioned to a Western lifestyle,” and cites a total of 17 sources to support this claim, including examples from Sudanese communities, Native Americans, Pacific Islanders, South Australian Aborigines, and the people of Vanuatu. “When humans switch from an ancient to a Western lifestyle,” he says, “they experience increased waistlines, reduced insulin sensitivity, higher blood pressure and a host of related disorders and diseases.“
Mystery 5: Lab Animals and Wild Animals
Lab animals and wild animals are becoming more obese because they are exposed to the same environmental contaminants that we are. If they are living around humans, in or around our buildings, eating industrially-prepared foods, or the scraps of such foods, they are exposed to contaminants in the same way as the rest of us. Even if they’re not living in close proximity to humans, these compounds are probably in groundwater and drinking water.
Mystery 6: Palatable Human Food
Lab rats gain more weight from human foods than they do from rat chow with similar nutritional properties because obesity doesn’t come from fat or carbohydrate content, but from contaminants in the food, and human food has more contaminants than the rat chow does, likely from packaging and processing.
Doesn’t this mean that avoiding packaged and processed food should reverse obesity? We think the answer is “maybe”. There’s not a lot of research on “whole foods” diets, but the evidence that we do have is quite promising. People seem to lose a reasonable amount of weight and keep it off for up to 12 months when they’re eating largely unprocessed plant-based foods. But we should be skeptical of these results until there are more studies. While this diet lowered almost everyone’s BMI, when we look at individual results, most people remained obese or overweight after 12 months on this diet. Similar findings are characteristic of Paleolithic diets. Even when the studies are conducted by advocates of these diets, they produce very moderate benefits, in one case causing only 6.6 lbs (3 kg) more weight loss than a comparison diet across a 3-month period.
This wasn’t intended as a weight loss diet — in fact, 20 potatoes a day was the amount he calculated he would need to maintain his weight (2,200 calories). Despite this, Voigt lost 21 pounds over his sixty-day diet. He even had trouble eating enough — he just wasn’t very hungry. Why would this happen? Well, unprocessed potatoes are about as raw a food as you can find, and won’t pick up contaminants from industrial cooking and packaging. If Voigt was being exposed to contaminants through his everyday diet, then switching to potatoes largely cooked at home would lead him to getting a much lower dose of contaminants.
Most people are likely exposed to some of these contaminants in their diet, so eating a diet with fewer contaminants helps. But most people are also likely exposed at home, at work, and through their drinking water, so diet alone can only help so much.
Mystery 7: Altitude
Obesity is less common at high altitudes because of the watershed. Environmental contaminants build up as water flows downhill and are in much higher concentrations as you approach sea level.
For example, take a look at this map of by-state obesity levels from the CDC:
The Mississippi watershed is America’s largest drainage basin, covering 41% of the country. If you compare this map of state-level obesity to a map of the Mississippi watershed (below), you’ll see that every single state with obesity rates of >35% borders on a river from this watershed system. Also informative is that the three states at the mouth of the river, Mississippi, Arkansas, and Louisiana, are #1, #3, and #4 in the nation in terms of obesity rate (39.5%, 37.1%, and 36.8%, respectively).
Obesity rates are high everywhere in America, but we can see that they are higher in states where the groundwater has covered more distance, and had more time to accumulate contaminants (see continental watershed map below). States where groundwater comes from shorter river systems have a clear tendency towards lower (though still in the range of 25%-30%) rates of obesity.
If this is the case, we should also see similar patterns in other countries.
It’s hard to find good province-level maps of obesity for China, but most of them look something like this:
China has two major rivers, the Yangzi and the Yellow river. Comparing our map of obesity to a map of China’s rivers, we see that Shandong Province, with the highest rate of obesity, is at the mouth of the Yellow River. Shanghai, at the mouth of the Yangzi, is not quite as obese, but still more obese than the neighboring provinces. And in general we see that provinces at lower elevations are more obese.
There are always a few confusing outliers, of course. Why are Maine, North Dakota, and Alabama so obese? In China, why are Xinjiang and Heilongjian provinces so obese? The answer is that watersheds play a role in the distribution of contaminants, but are not the whole story.
In some cases, though, the answer may come back to watersheds after all. For example, Xinjiang province’s main watershed is the Tarim Basin, an endorheic basin that captures water and has no outlet. Rain that falls in the Tarim Basin flows to Lop Nur and stays there. The water might evaporate, but any contaminants it carried will stay in the basin.
We see similar trends in data from Iran. In the map of Iran shown below, you can see that many of the most obese provinces are near the Caspian Sea, another endorheic basin. We weren’t able to find similar maps for Russia or for Kazakhstan, two other large countries bordering on the Caspian, but we would expect them to look similar.
There are obvious and often extreme differences in obesity between people at 0 ft of altitude and 500 ft of altitude, both in the US and in other countries. The changes in CO2 aren’t enough to make any difference, but water runoff could.
It’s important to note that altitude itself doesn’t affect obesity directly. Instead, altitude is a proxy for how high an area is in the watershed, which is itself a proxy for how badly the local water supply is contaminated. This is why Mississippi is more obese than low-lying areas of California. In California the water supply hasn’t traveled nearly as far in its path to the ocean, and has traveled past fewer farms, highways, cities, and factories.
Mystery 8: Diets Don’t Work
Finally, no diet will reliably help because obesity isn’t caused by a bad diet and can’t be cured by a good one. Hypothetically speaking, if there was a person who was only exposed to these contaminants in their food, cutting out the contaminated food for long enough would theoretically cure them. This may be what happened with Chris Voight when he cut out everything but potatoes.
Ultimately, the fact that diets don’t work very well for most people suggests that we pick up these contaminants from other sources than just our food. Probably they are also, to varying degrees, in our water, our workplaces, and our homes.
We aren’t the first researchers who are concerned about environmental contaminants and the role they might play in rising obesity rates. But the scope of this inquiry has traditionally been limited to how contaminants might contribute to the obesity epidemic.
One review focuses on food additives, both intentional (e.g. artificial sweeteners) and unintentional (pesticides), suggesting that “environmental contaminants are contributing to the global epidemic of obesity”, and suggests that the review will be “helpful in elucidating their role in the obesity epidemic”. Another review focused on endocrine disrupting chemicals, but closes by saying, “public health oﬃcials should think of the obesity epidemic as a function of a multifactorial complex of events, including environmental-endocrine disruptors.” Yet another mentions “nutrient quality, stress, fetal environment and pharmaceutical or chemical exposure as relevant contributing influences.”
Canaries in the Coal Mine documents the rise in obesity in wild and captive animals and suggests that “the aetiology of increasing body weight may involve several as-of-yet unidentified and/or poorly understood factors.” What factors could these be? They list a couple: viral pathogens, epigenetic factors, and at the very end of the paper, “the collection of endocrine-disrupting chemicals (endocrine-disruptors), widely present in the environment.”
A National Toxicology Program workshop from 2012 suggests that “exposures to environmental chemicals may be contributing factors to the epidemics of diabetes and obesity.” They suggest that there is a link between some forms of contamination and type 2 diabetes, but overall, they say that there is still not enough research to draw firm conclusions.
Despite this interest, all the claims have been quite mild, identifying environmental contaminants as possibly being one of many factors contributing in some small way to the obesity epidemic. In contrast, we propose that the obesity epidemic is entirely driven by environmental contaminants. The entire difference in obesity between 1980 and today is attributable to one or more contaminants that we are exposed to in our food, water, and living spaces.
Still, not everyone today is obese. There are two reasons for this. First of all, even though everyone is exposed, some people are exposed to more than other people. If you live in an environment with less exposure, for example at a higher altitude, on average you will be less obese.
Some people are also less affected by these contaminants than other people, even at the same dose, and this difference is largely genetic. But even these people probably still, on average, have much more body fat than their ancestors did. Hunter-gatherers have BMIs of around 22 or 23. Civil War veterans in the 1890s had average BMIs of about 23 as well. If your BMI is higher than 23, you’re probably fatter than you would be without the action of these contaminants.
Sometimes it is these factors in combination. If you have a genetic resistance and you’re exposed to low levels of these contaminants, you’ll be much less obese than average.
3.4 Further Evidence in Favor of Contaminants
The difference in obesity rates between countries, as well as the differences between states or provinces within a country, is also the result of differences in contamination. Some of it will be genetic, but some of it is because some places are more contaminated than others.
During the Cuban economic crisis known as the “Special Period”, obesity rates plummeted, from 14% obese to 7% obese. Normally this is attributed to the decrease in calorie consumption and the increase in exercise, as oil shortages led people to drive less and walk or bike more. But we know already that reducing consumption and increasing exercise have very modest effects on weight loss.
Food was restricted, but very few people were starving. While obesity dropped from 14% to 7%, the number of people listed as “underweight” only went from 8% to 10.3% (see above). And it’s not like they were eating all that healthy — “the primary sources of energy during the crisis were sugar cane and rice.”
Rarely mentioned but particularly notable is that food imports virtually ceased during this period. If we assume that Cuban obesity was partially a result of contaminants in their food imports, this explains the data perfectly. Alternately, we can note that fertilizer and pesticide use sharply declined in this period as well, as both were normally derived from oil, which the island was now seriously lacking. These are also potential contaminants.
One surprising fact is that the most obese countries in the world by BMI are all tiny island nations in the south or central Pacific — Nauru, Tonga, Samoa, Tuvalu, Palau, the Cook Islands, and others. Depending on the year and the source, the 10 most obese nations in the world are usually small Pacific islands. Obesity rates in these countries are not merely high, they are clear outliers. The most obese mainland nations are around 35-40% obese, but these small Pacific islands have obesity rates in the range of 45-60%. Certainly this requires some sort of explanation.
To begin with, there are some reasons to suspect that this is largely an artefact. These islands all have very small populations and are genetically homogeneous, so it’s possible that much of the difference is genetic. Polynesians also appear to be slightly stouter and more muscular than other groups, which may mean that BMI is an especially bad measure for this group and leads us to slightly overestimate how obese they are. With a better measure of obesity, their obesity rates might be more similar to the rates of other very obese countries, like Kuwait and the United States.
In addition, Polynesian countries import most of their food and eat a lot of highly processed, canned meat (famously spam), which may be more contaminated than average. It’s also notable that Nauru, the most obese (61%) country in the world, has been heavily strip-mined for phosphate. This is interesting because mining is a major source of environmental contamination. For comparison, West Virginia is an obesity outlier in the United States, and it too has a long history of strip mining. In any case, this is why Nauru imports so much food — with about 80% of the island strip-mined, they can’t grow anything there. Most of these islands are not so heavily mined, of course, but this might explain why Nauru is 61% obese and Samoa is “only” 47% obese.
Current theories of the obesity epidemic are inadequate. None of them hold up to closer scrutiny, and none can explain all of the mysteries mentioned in Part I. But these mysteries are real, puzzling data about the obesity epidemic.
You’re probably familiar with several theories of the obesity epidemic, but there is strong evidence against all of them. In this section, we focus on the case against a couple of the most popular theories.
2.1 Calories In, Calories Out
A popular theory of obesity is that it’s simply a question of calories in versus calories out (CICO). You eat a certain number of calories every day, and you expend some number of calories based on your metabolic needs and physical activity. If you eat more calories than you expend, you store the excess as fat and gain weight, and if you expend more than you eat, you burn fat and lose weight.
This perspective assumes that the body stores every extra calorie you eat as body fat, and that it doesn’t have any tools for using more or less energy as the need arises. But this isn’t the case. Your body has the ability to regulate things like its temperature, and it has similar tools to regulate body fatness. When we look closely, it turns out that “calories in, calories out” doesn’t match the actual facts of consumption and weight gain.
“This model seems to exist mostly to make lean people feel smug,” writes Stephen Guyenet, “since it attributes their leanness entirely to wise voluntary decisions and a strong character. I think at this point, few people in the research world believe the CICO model.”
It’s not that calories don’t matter at all. People who are on a starvation diet of 400 calories per day will lose weight, and as we will see in this section, people who eat hundreds of calories more than they need will usually gain weight. The problem is that this ignores how the body accounts for the calories coming in and going out. If you don’t eat enough, your body finds ways to burn fewer calories. If you eat too much, your body doesn’t store all of the excess as fat, and compensates by making you less hungry later on. Calories are involved in the math but it’s not as simple as “weight gain = calories in – calories out”.
[Edit: We’ve added an interlude clarifying this section on CICO in response to reader questions and objections. If you have objections to anything below, read the interlude here because we probably address it!]
2.1.1 Common Sense
First, we want to present some common-sense arguments for why diet and exercise alone don’t explain modern levels of obesity.
Everyone “knows” that diet and exercise are the solution to obesity. Despite this, rates of obesity continue to increase, even with all the medical advice pointing to diet and lifestyle interventions, and a $200 billion global industry devoted to helping people implement these interventions. It’s not that no one is listening. People are exercising more today than they were 10 or even 20 years ago. Contrary to stereotypes, more than 50% of Americans meet the HHS guidelines for aerobic exercise. But obesity is still on the rise.
It’s true that people eat more calories today than they did in the 1960s and 70s, but the difference is quite small. Sources have a surprisingly hard time agreeing on just how much more we eat than our grandparents did, but all of them agree that it’s not much. Pew says calorie intake in the US increased from 2,025 calories per day in 1970 to about 2,481 calories per day in 2010. The USDA Economic Research Service estimates that calorie intake in the US increased from 2,016 calories per day in 1970 to about 2,390 calories per day in 2014. Neither of these are jaw-dropping increases.
If we go back further, the story actually becomes even more interesting. Based on estimates from nutrient availability data, Americans actually ate more calories in 1909 than they did in 1960.
Finally, there are many medical conditions that cause obesity. For example, Prader-Willi Syndrome, a genetic disorder characterized by intense hunger and resulting obesity, hypothyroidism, an endocrine disorder where people experience loss of appetite yet still gain 5-10 pounds, and lesions to the hypothalamus, which often lead to intense weight gain, sometimes accompanied by great hunger but many times not.
2.1.2 Scientific Evidence
In addition to these common-sense objections, decades of research suggests that diet and exercise are not to blame for rising rates of obesity.
Studies of controlled overfeeding — you take a group of people and get them to eat way more than they normally would — reliably find two things. First, a person at a healthy weight has to eat huge amounts of calories to gain even a couple pounds. Second, after the overfeeding stops, people go right back to the weight they were before the experiment.
On this olympian diet, the prisoners did gain considerable weight, on average 35.7 lbs (16.2 kg). But following the overfeeding section of the study, the prisoners all rapidly lost weight without any additional eﬀort, and after 10 weeks, all of them returned to within a couple pounds of their original weight. One prisoner actually ended up about 5 lbs (2.3 kg) lighter than before the experiment began!
Inspired by this, in 1972, George Bray decided to conduct a similar experiment on himself. He was interested in conducting overfeeding studies, and reasoned that if he was going to inflict this on others, he should be willing to undergo the procedure himself. First he tried to double each of his meals, but found that he wasn’t able to gain any weight — he simply couldn’t fit two sandwiches in his stomach at every sitting.
He switched to energy-dense foods, especially milkshakes and ice cream, and started eating an estimated 10,000 calories per day. Soon he began to put on weight, and gained about 22 lbs (10 kg) over 10 weeks. He decided this was enough and returned to his normal diet. Six weeks later, he was back at his original weight, without any particular eﬀort.
In both cases, you’ll notice that even when eating truly stupendous amounts of food, it actually takes more time to gain weight than it does to lose it. Many similar studies have been conducted and all of them find basically the same thing — check out this recent review article of 25 studies for more detail.
Overfeeding in controlled environments does make people gain weight. But they don’t gain enough weight to explain the obesity epidemic. If you eat 10,000 calories per day, you might be able to gain 20 or 30 pounds, but most Americans aren’t eating 10,000 calories per day.
We can compare these numbers to the increases in average calories per day we reviewed earlier. Sure, consumption in the US went from 2,025 calories per day in 1970 to 2,481 calories per day in 2010, a diﬀerence of 456 calories. But consider Poehlman et al. (1986), where researchers fed a group of 12 men 1,000 extra calories a day for 22 days. On average the men gained about 5 lbs (2.2 kg), but some of them actually lost weight instead.
And it’s not as though these participants are eating 1,000 extra calories of celery and carrots. In one study, the extra calories came from “sherbet, fruit juices, margarine, corn oil, and cookies”. But the content doesn’t seem to matter very much. Another study compared overfeeding with carbohydrates (mostly starch and sugar) and overfeeding with fat (mostly dairy fat like cream and butter). The two groups got their extra calories from different sources, but they were overfed by the same amount. After two weeks, both groups gained the same amount of fat, 3.3 lbs on average. A similar study overfed volunteers by 1,194 calories on either a high-carb or a high-fat diet for 21 days. Both groups gained only about 2 lbs of fat.
The fact that many of these are twin studies provides even more evidence against CICO. In groups of twins that are all overfed by the same amount, there is substantial variation between the different participants in general. Some people gain a lot of weight, others gain almost none. But each person gains (or loses!) about the same amount of weight as their twin. In some cases these correlations can be substantial, as high as r = 0.90. This strongly suggests that genetics plays a large role in determining how the body responds to overfeeding.
The story with exercise is the same as with overeating — it makes a difference, but not much. One randomized controlled trial assigned overweight men and women to different amounts of exercise. More exercise did lead to more body fat loss, but even in the group exercising the most — equivalent to 20 miles (32.0 km) of jogging every week for eight months — people only lost about 7 lbs.
You might think that hunter-gatherers have a more active lifestyle than we do, but this isn’t always true. The Kitavans examined in 1990 by Staffan Lindeberg were only slightly more active than westerners, had more food than they knew what to do with, and yet were never obese. “Many Westerners have a level of physical activity that is well within the range of the Kitava population,” he wrote. “Hence, physical activity does not seem to explain most of the differences in disease pattern between Kitava and the Western world.”
A recent meta-analysis of 36 studies compared the effects of interval training exercise with more traditional moderate-intensity continuous training. The authors call interval training “the magic bullet for fat loss” (this is literally in the title) and trumpet that it provides 28.5% greater reductions in total absolute fat mass than moderate exercise. But what they don’t tell you is that this is a difference between a loss of about 3 lbs and about 4 lbs, for an exercise program running 12 weeks long. Needless to say, this difference isn’t very impressive. Other meta-analyses find similar results: “neither short-term HIIT/SIT nor MICT produced clinically meaningful reductions in body fat.”
Maybe diet and exercise together are worth more than the sum of their parts? Sadly this doesn’t seem to be the case either. If anything, when combined they are worth less than the sum of their parts. One meta-analysis comparing interventions based on diet, exercise, and diet plus exercise found that people lost about 23.5 lbs (10.7 kg) on diets, 6.4 lbs (2.9 kg) on an exercise regime, and 24.2 lbs (11.0 kg) on diet plus exercise. After a year, diet plus exercise was down to 18.9 lbs (8.6 kg). Other meta-analyses are more tempered, for example, finding a loss of about 3.6 lbs (1.6 kg) after two years of diet plus exercise interventions. Again this is more weight loss than zero, but it clearly rules out diet plus exercise as an explanation for the obesity epidemic. People in 1950 were a lot leaner than they are now, but it’s not because they ate less and exercised more.
2.2 Good Calories and Bad Calories
Ok, calories themselves may not be the villain here. But maybe it’s not that we’re eating more than we used to — maybe it’s that we’re eating diﬀerently. Maybe one particular macronutrient or source of calories is to blame.
2.2.1 Dietary Fat
Dietary fat seems like a possible culprit. After all, fat makes you fat, right? Turns out it’s not so simple.
Plenty of cultures eat extremely high-fat diets and remain very lean indeed. You’ll remember that the Maasai diet is about 3000 calories per day, and 66% of that is from fat. But the Maasai don’t suﬀer from obesity. In fact, Kalahari Bushmen love fat and apparently wax poetic about it.
We even see differences within a specific kind of animal. The same high-fat diet will make one species of hamster (Syrian hamsters) obese and leave another species of hamster (golden hamsters) merely chubby. If the findings can’t generalize between different species of hamsters, we shouldn’t expect them to generalize to humans.
In any case, it’s hard to square a fat-based explanation for the obesity epidemic with the fact that fat consumption hasn’t increased in step with the rise of obesity and the fact that low-fat diets don’t lead to much weight loss.
Ok, maybe fat doesn’t make you fat. How about carbohydrates? All this bread can’t be good for us.
This theory is dead on the starting line, though, because as obesity has gone up, consumption of carbohydrates has gone down (see figure).
This is enough to make it clear that carbohydrate consumption isn’t driving the obesity epidemic, but we can take a slightly closer look anyways, just to be sure.
Eating lots of carbs can actually make you lose weight. High-carbohydrate diets cause weight loss, even when not restricting calories. A study from 2003 examined low-fat diets in 16 overweight people. Naturally, this low-fat diet was high in carbohydrates. When patients started the low-fat diet and were told to eat as much as they wanted, they actually ate 291 calories less per day.
But their carbohydrate intake increased, from 253 grams per day to 318 grams per day. On this diet they lost 8 lbs (3.8 kg) on average over a 12-week period. In the DIETFITS randomized controlled trial, 609 people fed a whole-food, high-carbohydrate diet lost 12 pounds (5.3 kg) over one year, not significantly different from the 13 pounds (6.0 kg) of weight lost on a whole-food low-carbohydrate diet. The high-carbohydrate diet also supplied about 1.5 times as much sugar as the low-carbohydrate diet.
The residents of Kitava, mentioned earlier, have a diet of starchy roots and tubers. Almost 70% of their calories come from carbohydrates, but they don’t suﬀer from obesity, diabetes, or heart disease.
(Lindeberg also says: “The long primate history of fruit eating, the high activity of human salivary amylase for effcient starch digestion, and some other features of human mouth physiology … suggest that humans are well prepared for a high carbohydrate intake from non-grain food sources. … in contrast to most other animals including non-human primates, humans have an exceptional capacity to produce salivary amylase in order to begin hydrolysis of starch in the mouth.”)
In general, cultures with very high intakes of carbohydrate tend to be lean. Most agricultural societies around the world have a diet that is high in carbohydrates and low in fat. Agricultural societies are different from industrialized ones in many ways, of course. But even in those agricultural cultures with abundant food, people are typically lean, with low rates of diabetes and cardiovascular disease.
In fact, people who move from Japan to the US and begin eating less white rice become much heavier. This suggests that the diﬀerence isn’t simply genetic. These immigrants do end up eating a diet much higher in fat — but of course, from the previous section, we’ve seen that fat can’t be responsible for this change.
Nor is it likely to be some other carbohydrate staple. Wheat consumption, for example, has been falling for a century. People in the US ate almost twice as much wheat (primarily in the form of bread) in the 1880’s than they do today. If wheat were responsible, people would have been massively obese during reconstruction and entirely lean today. Obviously that is not what we observe.
If the historical data isn’t enough for you, there are entire reviews devoted to the health impacts of wheat, pretty conclusively showing that it isn’t a cause of obesity.
Everyone knows that added sugar is the real villain, right? Wrong again.
Sugar consumption has been declining for 20 years in the US, while obesity and diabetes rates have increased. The sugar data in the figure below includes all added sugars such as honey, table sugar, and high-fructose corn syrup, but doesn’t include sugars naturally occurring in fruits and vegetables.
We see something similar in what has been called The Australian Paradox, where obesity in Australia nearly tripled between 1980-2003, while sugar consumption dropped 23%.
We see that public health efforts to reduce sugar consumption have worked. In fact, they’ve worked very well. But they don’t seem to have made any difference to the obesity epidemic.
Tightly-controlled metabolic ward studies also show that the sugar content of a diet doesn’t matter much. One study of 17 men compared a 25 percent sugar, high-carbohydrate diet to a 2 percent sugar, very-low-carbohydrate (ketogenic) diet of equal calories. After four weeks, they found that the high-carbohydrate diet caused slightly more body fat loss than the very-low-carbohydrate (ketogenic) diet, despite the fact that the two diets diﬀered more than tenfold in sugar content. We see similar results in mice and in rats: “Animals fed a low-fat, high-sucrose (LH) diet were actually leaner than animals fed a high-complex-carbohydrate diet.”
We can further cite the fact that many cultures, such as the Hadza of Tanzania, the Mbuti of the Congo, and the Kuna of Panama all eat diets relatively high in sugar (sometimes as high at 80%), and yet none of these cultures have noticeable rates of obesity, diabetes, cardiovascular disease, etc.
2.3 Diet in General
Over the past 40 years, there hasn’t been much of a change in where people get their calories from. Americans get about 50% of their calories from carbohydrates, 30% from fat, and 20% from protein, and they have for years. At the same time obesity continues to go up and up. Comparing these two trends, it’s hard to imagine that macronutrients have anything to do with the obesity epidemic.
All diets work. The problem is that none of them work very well. Stick to just about any diet for a couple weeks and you will probably lose about 10 pounds. This is ok, but it isn’t much comfort for someone who is 40 lbs overweight. And it isn’t commensurate with the size of the obesity epidemic.
There are too many diets to review in full, of course, but we see the same pattern in every diet that has been extensively studied. Let’s look at just a few.
2.3.1 Ketogenic Diet
We’ve already mentioned a few ketogenic diets, and as we’ve seen, they don’t work much better than other diets do.
There is one meta-analysis of ketogenic diet studies, comparing very-low-carbohydrate ketogenic diets to low fat diets in overweight and obese adults. Across thirteen randomized controlled trials, ketogenic diets only caused 2 pounds (0.9 kg) more weight loss than the traditional low-fat diets after 12 months.
2.3.2 Low-Glycemic Diet
Study after study finds that low-glycemic diets don’t work for weight loss.
One study from 2007 randomly assigned 203 women to either a high-glycemic or low-glycemic diet. The difference in glycemic index was considerable, with the high-glycemic diet having an index twice as high as the low-glycemic diet. The groups consumed the same amount of calories and reported similar levels of hunger.
Despite this, there was no difference between the groups. After two months the LGI group had lost 1.6 lbs (0.72 kg) and the HGI group had lost 0.7 lbs (0.31 kg), but this difference wasn’t sustained. After 18 months on the diet, the LGI group had lost 0.9 lbs (0.41) kg and the HGI group had lost 0.6 lbs (0.26 kg), and this difference was statistically indistinguishable (p = .93). Large differences in glycemic index have no meaningful long-term (or even short-term) effect on calorie consumption or body weight.
Another 18-month randomized trial compared a low-glycemic load (40% carbohydrate and 35% fat) vs low-fat (55% carbohydrate and 20% fat) diet in 73 obese young adults in the Boston, Massachussets area. In both diets, participants were largely eating whole foods; vegetables, beans, and fruit were major components of both diets. In both diets, people were allowed to eat as much as they wanted.
Both groups reported similar levels of hunger and consumed similar amounts of calories. The two diets were rated equally easy to stick to and equally tasty. Both groups lost about 4-5 lbs after 6 months. But both groups started to gain weight back soon after. In fact, the trajectory of weight loss is so identical, we simply have to show you the graph:
Note the p-value of 0.99, which indicates that the two trajectories are about as statistically indistinguishable as is mathematically possible.
We find this in study after study. Meta-analysis also finds that low-glycemic diets don’t do any better than other diets when it comes to weight loss. When the reviewers pick out the studies that show the best performance for low-glycemic diets, they still find a diﬀerence of only 4 lbs (1.8 kg). If that’s a success, we have to wonder what failure would look like.
2.3.3 Future Dietary Explanations
Eating fewer calories will lead most people to lose a couple pounds, and it doesn’t really matter what calories they restrict. Cutting back on fat works about as well as cutting back on carbs. In both cases, a couple pounds isn’t enough to explain the obesity epidemic.
Over the past 50 years, medical science has looked at diet from practically every angle. But none of these diet-based explanations have gone anywhere. People are still getting fatter. They got fatter over the last decade. And they got fatter over the decade before that. And the one before that. Every country in the world is growing more obese. And the trend has never once been reversed.
You could certainly cook up another diet-based explanation. But there’s no reason to expect that this explanation would do any better than any of the others.
It’s time to start looking for explanations outside the world of calories, macronutrients, and exercise. At this point, we should assume that the obesity epidemic isn’t caused by our diet.
There is one theory of obesity which is almost entirely satisfying, based around the body’s ability to regulate its adiposity.
A house has a thermostat. The owner of the house sets the temperature to 72 degrees F. The thermostat detects the temperature of the house and takes action to drive the temperature to the set point of 72°F. If the house is too cold, the thermostat will turn on the furnace. If the house is too warm, the thermostat will turn on the air conditioning.
The human body has a lipostat (from the Greek lipos, meaning fat). Evolution and environmental factors set body fatness to some range — perhaps a BMI of around 23. The lipostat detects how much fat is stored and takes action to drive body fatness to the set point of a BMI of 23. If your body is too thin, the lipostat will drive you to eat more, exercise less, sleep more, and store more of what you eat as fat. If your body is too fat, the lipostat will turn on the air conditioning. Just kidding, the lipostat will drive you to eat less, move and fidget more, and store less of the food you eat as fat.
According to this theory, people become obese because something has gone wrong with the lipostat. If the owner of a house sets the thermostat to 120°F, the house will quickly become too hot, and it will stay that way until the set point is changed or the furnace explodes. Something similar is happening in obesity. The set point has been moved from a healthy and natural level of adiposity (BMI of about 23) to an unusually high level (BMI 30+), and all the regulatory systems of the body are working in concert to push adiposity to that level and keep it there.
The lipostat model is supported by more than a hundred years of evidence. By the 1970s, Dr. Michel Cabanac and collaborators were publishing papers in the journal Nature on what they called the “ponderostat” (pondero = weight). This was later revised to the adipostat (adipo = fat), and eventually, as we call it here, the lipostat.
Modern neuroscience and medicalreviewarticles (those are three separate links) overwhelmingly support this homeostatic explanation. In animals and humans, brain damage to the implicated areas leads to overeating and eventual obesity. These systems are well-understood enough that by targeting certain neurons you can cure or cause obesity in mice. While we don’t approve of destroying neurons in human brains with hyperspecific chemical techniques, the few weight-loss drugs approved by the FDA largely act on the brain (hopefully without destroying any neurons).
The lipostat explains why diet and exercise work a little, why they don’t work well enough to reverse obesity, and why even people who lose weight on diets generally end up gaining that weight right back.
In a house where the thermostat has been set to 120°F, there are a lot of things we can do to lower the temperature. We can open all the doors and windows. We can open the icebox. We can order mountains of dry ice oﬀof the internet. All of these things will lower the temperature of the house a little, but even with these measures, the house will still be hotter than the healthy temperature of 72ºF. The furnace will work double-time to push the temperature back up to 120ºF, if it’s not redlining already. And as soon as you relax any of your heat-dissipation measures, the temperature will go right back up to where it was before.
(We can also go down into the basement and hit the furnace with crowbars until it doesn’t work very well anymore. This is a pretty extreme solution and also, incidentally, why gastric bypass surgery works so great.)
When people intentionally overeat, as in the overfeeding studies we reviewed, they temporarily gain a little weight, but when they stop overeating, they quickly return to their original weight. When people intentionally undereat, as they do on a diet, they temporarily lose a little weight, but when they stop undereating they quickly return to their original weight. In fact, they usually return to near their original weight even if they keep undereating. The lipostat has a target weight and, when not actively opposed, it will push your body weight to that weight and do its best to keep it there.
There are many signals that the brain uses to measure how much fat the body is carrying. One of the most important is the hormone leptin, which is naturally produced by fat cells. Part of the action of the lipostat is making sure that leptin levels are kept within a desired range, which helps keep us at a desired weight.
. . . leptin-deficient children are nearly always hungry, and they almost always want to eat, even shortly after meals. Their appetite is so exaggerated that it’s almost impossible to put them on a diet: if their food is restricted, they find some way to eat, including retrieving stale morsels from the trash can and gnawing on fish sticks directly from the freezer. This is the desperation of starvation [. . . ] they become distressed if they’re out of sight of food, even briefly. If they don’t get food, they become combative, crying and demanding something to eat.
The lipostat account is extremely convincing. The only weakness in the theory is that it’s not clear what could cause the lipostat to be set to the wrong point. In leptin-deficient children, their body simply can’t detect that they are obese. But most people produce leptin just fine. What is it that throws this system so totally out of balance?
While the lipostat perspective does in a sense explain why people become obese (their lipostat is out of alignment), it’s not really a theory of the obesity epidemic, since it doesn’t explain why our lipostats began getting more and more out of balance around 1980.
Even advocates of the theory are perfectly willing to admit this. In The Hungry Brain, Stephen Guyenet writes:
Many researchers have tried to narrow down the mechanisms by which [diet] causes changes in the hypothalamus and obesity, and they have come up with a number of hypotheses with varying amounts of evidence to support them. Some researchers believe the low fiber content of the diet precipitates inflammation and obesity by its adverse effects on bacterial populations in the gut (the gut microbiota). Others propose that saturated fat is behind the effect, and unsaturated fats like olive oil are less fattening. Still others believe the harmful effects of overeating itself, including the inflammation caused by excess fat and sugar in the bloodstream and in cells, may affect the hypothalamus and gradually increase the set point. In the end, these mechanisms could all be working together to promote obesity. We don’t know all the details yet…
Guyenet favors a “food reward” explanation, where eating “highly rewarding food” causes a mild form of brain damage that turns up the set point of the lipostat. He’s even gone so far as to propose (as an April Fools joke) a collection of boring recipes called The Bland Food Cookbook.
You’ll notice that in all these theories, the factors that damage the lipostat are related to diet. But as we’ve just argued above, the persistent failure to find a solution in our diets strongly suggests that we should start looking elsewhere for the explanation.
2.6 What, Then?
We should start seriously considering other paradigms. If diet and exercise are out as explanations for the epidemic, what could possibly explain it? And what could possibly explain all of the other bizarre trends that we have observed?
The first mystery is the obesity epidemic itself. It’s hard for a modern person to appreciate just how thin we all were for most of human history. A century ago, the average man in the US weighed around 155 lbs. Today, he weighs about 195 lbs. About 1% of the population was obese back then. Now it’s about 36%.
Back in the 1890s, the federal government had a board of surgeons examine several thousand Union Army veterans who fought in the Civil War. This was several decades after the end of the war, so by this point the veterans were all in their 40’s or older. This gives us a snapshot of what middle-aged white men looked like in the 1890s. When we look at their data, we find that they had an average BMI of about 23 (overweight is a BMI of 25 and obese is a BMI of 30 or more). Only about 3% of them were obese. In comparison, middle-aged white men in the year 2000 had an average BMI of around 28. About 24% were obese in early middle age, increasing to 41% by the time the men were in their 60s.
(Most experts consider measures like body fat percentage to be better measures of adiposity than BMI, and we agree. Unfortunately, nearly every source reports BMI, and most don’t report body fat percentage. Here, we use BMI so that we can compare different sources to one another.)
It’s not just that we’re a little fatter than our great-grandparents — the entire picture is different.
People in the 1800s did have diets that were very different from ours. But by conventional wisdom, their diets were worse, not better. They ate more bread and almost four times more butter than we do today. They also consumed more cream, milk, and lard. This seems closely related to observations like the French Paradox — the French eat a lot of fatty cheese and butter, so why aren’t they fatter and sicker?
Our great-grandparents (and the French) were able to maintain these weights eﬀortlessly. They weren’t all on weird starvation diets or crazy fasting routines. And while they probably exercised more on average than we do, the minor diﬀerence in exercise isn’t enough to explain the enormous diﬀerence in weight. Many of them were farmers or laborers, of course, but plenty of people in 1900 had cushy desk jobs, and those people weren’t obese either.
Something seems to have changed. But surprisingly, we don’t seem to have any idea what that thing was.
Mystery 2: An Abrupt Shift
Another thing that many people are not aware of is just how abrupt this change was. Between 1890 and 1976, people got a little heavier. The average BMI went from about 23 to about 26. This corresponds with rates of obesity going from about 3% to about 10%. The rate of obesity in most developed countries was steady at around 10% until 1980, when it suddenly began to rise.
Today the rate of obesity in Italy, France, and Sweden is around 20%. In 1975, there was no country in the world that had an obesity rate higher than 15%.
This wasn’t a steady, gentle trend as food got better, or diets got worse. People had access to plenty of delicious, high-calorie foods back in 1965. Doritos were invented in 1966, Twinkies in 1930, Oreos in 1912, and Coca-Cola all the way back in 1886. So what changed in 1980?
Common wisdom today tells us that we get heavier as we get older. But historically, this wasn’t true. In the past, most people got slightly leaner as they got older. Those Civil War veterans we mentioned above had an average BMI of 23.2 in their 40s and 22.9 in their 60’s. In their 40’s, 3.7% were obese, compared to 2.9% in their 60s. We see the same pattern in data from 1976-1980: people in their 60s had slightly lower BMIs and were slightly less likely to be obese than people in their 40s (See the table below). It isn’t until the 1980s that we start to see this trend reverse. Something fundamental about the nature of obesity has changed.
Mystery 3: The Ongoing Crisis
Things don’t seem to be getting any better. A couple decades ago, rising obesity rates were a frequent topic of discussion, debate, and concern. But recently it has received much less attention; from the lack of press and popular coverage, you might reasonably assume that if we aren’t winning the fight against obesity, we’ve gotten at least to a stalemate.
Rates of obesity are also increasing worldwide. As The Lancet notes, “unlike other major causes of preventable death and disability, such as tobacco use, injuries, and infectious diseases, there are no exemplar populations in which the obesity epidemic has been reversed by public health measures.”
All of this is, to say the least, very mysterious.
Of course, variety isn’t everything. You would also expect that people need to eat the right diet. A balanced diet, with the right mix of macronutrients. But again, this doesn’t seem to be the case. Hunter-gatherer societies around the world have incredibly different diets, some of them very extreme, and almost never suffer from obesity.
Historically, different cultures had wildly different diets — some hunter-gatherers ate diets very high in sugar, some very high in fat, some very high in starch, etc. Some had diets that were extremely varied, while others survived largely off of just two or three foods. Yet all of these different groups remained lean. This is strong evidence against the idea that a high-fat, high-sugar, high-starch, low-variety, high-variety, etc. diet could cause obesity.
A Tanzanian hunter-gatherer society called the Hadza get about 15 percent of their calories from honey. Combined with all the sugar they get from eating fruit, they end up eating about the same amount of sugar as Americans do. Despite this, the Hadza do not exhibit obesity. Another group, the Mbuti of the Congo, eat almost nothing but honey during the rainy season, when honey can provide up to 80% of the calories in their diet. These are all unrefined sugars, of course, but the Kuna of Panama, though mostly hunter-gatherers, also obtain white sugar and some sugar-containing foods from trade. Their diet is 65 percent carbohydrate and 17% sugar, which is more sugar than the average American currently consumes. Despite this the Kuna are lean, with average BMIs around 22-23.
Kitava is a Melanesian island largely isolated from the outside world. In 1990, Staffan Lindeberg went to the island to study the diet, lifestyle, and health of its people. He found a diet based on starchy tubers and roots like yam, sweet potato, and taro, supplemented by fruit, vegetables, seafood, and coconut. Food was abundant and easy to come by, and the Kitavans ate as much as they wanted. “It is obvious from our investigations,” wrote Lindeberg, “that lack of food is an unknown concept, and that the surplus of fruits and vegetables regularly rots or is eaten by dogs.”
About 70% of the calories in the Kitavan diet came from carbohydrates. For comparison, the modern American diet is about 50% carbohydrates. Despite this, none of the Kitavans were obese. Instead they were in excellent health. Below, you’ll see a photo of a Kitavan man being examined by Lindeberg.
Kitavans didn’t even seem to gain weight in middle age. In fact, BMI was found to decrease with age. Many lived into their 80s or 90s, and Lindeberg even observed one man who he estimated to be 100 years old. None of the elderly Kitavans showed signs of dementia or memory loss. The Kitavans also had no incidence of diabetes, heart attacks, stroke, or cardiovascular disease, and were unfamiliar with the symptoms of these diseases. “The only cases of sudden death they could recall,” he reports, “were accidents such as drowning or falling from a coconut tree.”
Mystery 5: Lab Animals and Wild Animals
Humans aren’t the only ones who are growing more obese — lab animals and even wild animals are becoming more obese as well. Primates and rodents living in research colonies, feral rodents living in our cities, and domestic pets like dogs and cats are all steadily getting fatter and fatter. This can’t be attributed to changes in what they eat, because lab animals live in contained environments with highly controlled diets. They’re being fed the same foods as always, but for some reason, they’re getting fatter.
This seems to be true everywhere you look. Our pets may eat scraps from the table, but why would zoo animals, being fed by professionals, also be getting fatter? Even horses are becoming more obese. This is all very strange, and none of it fits with the normal explanations for the obesity epidemic.
It used to be that if researchers needed obese rats for a study, they would just add fat to normal rodent chow. But it turns out that it takes a long time for rats to become obese on this diet. A breakthrough occurred one day when a graduate student happened to put a rat onto a bench where another student had left a half-finished bowl of Froot Loops. Rats are usually cautious around new foods, but in this case the rat wandered over and began scarfing down the brightly-colored cereal. The graduate student was inspired to try putting the rats on a diet of “palatable supermarket food”; not only Froot Loops, but foods like Doritos, pork rinds, and wedding cake. Today, researchers call these “cafeteria diets”.
Sure enough, on this diet the rats gained weight at unprecedented speed. All this despite the fact that the high-fat and cafeteria diets have similar nutritional profiles, including very similar fat/kcal percentages, around 45%. In both diets, rats were allowed to eat as much as they wanted. When you give a rat a high-fat diet, it eats the right amount and then stops eating, and maintains a healthy weight. But when you give a rat the cafeteria diet, it just keeps eating, and quickly becomes overweight. Something is making them eat more. “Palatable human food is the most effective way to cause a normal rat to spontaneously overeat and become obese,” says neuroscientist Stephan Guyenet in The Hungry Brain, “and its fattening effect cannot be attributed solely to its fat or sugar content.”
We see a similar pattern of results in humans. With access to lots of calorie-dense, tasty foods, people reliably overeat and rapidly gain weight. But again, it’s not just the contents. For some reason, eating more fat or sugar by itself isn’t as fattening as the cafeteria diet. Why is “palatable human food” so much worse for your waistline than its fat and sugar alone would suggest?
One paper, Hypobaric Hypoxia Causes Body Weight Reduction in Obese Subjects from Lippl et al. (2012), claims to show a reduction in weight at high altitude and suggests that this weight loss is attributable to differences in oxygen levels. However, there are a number of problems with this paper and its conclusions. To begin with, there isn’t a control group, so this isn’t an experiment. Without an appropriate control, it’s hard to infer a causal relationship. What they actually show is that people brought to 2,650 meters lost a small amount of weight and had lower blood oxygen saturation, but this is unsurprising. Obviously if you bring people to 2,650 meters they will have lower blood oxygen, and there’s no evidence linking that to the reported weight loss. They don’t even report a correlation between blood oxygen saturation and weight loss, even though that would be the relevant test given the data they have. Presumably they don’t report it because it’s not significant. In addition there are major issues with multiple comparisons, which make their few significant findings hard to interpret (for more detail, see our full analysis of the paper).
Mystery 8: Diets Don’t Work
There’s a lot of disagreement about which diet is best for weight loss. People spend a lot of time arguing over how to diet, and about which diet is best. I’m sure people have come to blows over whether you lose more weight on keto or on the Mediterranean diet, but meta-analysis consistently finds that there is little difference between different diets.
Some people do lose weight on diets. Some of them even lose a lot of weight. But the best research finds that diets just don’t work very well in general, and that no one diet seems to be better than any other. For example, a 2013 review of 4 meta-analyses said:
Numerous randomized trials comparing diets differing in macronutrient compositions (eg, low-carbohydrate, low-fat, Mediterranean) have demonstrated differences in weight loss and metabolic risk factors that are small (ie, a mean difference of <1 kg) and inconsistent.
Most diets lead to weight loss of around 5-20 lbs, with minimal differences between them. Now, 20 lbs isn’t nothing, but it’s also not much compared to the overall size of the obesity epidemic. And even if someone does lose 20 lbs, in general they will gain most of it back within a year.
A better title would be, Were Polish Aristocrats in the 1890s really that Obese?, because the chapter makes a number of striking claims about rates of overweight and obesity in Poland around the turn of the century, especially among women, and especially especially among the upper classes.
Budnik & Henneberg draw on data from historical sources to estimate height and body mass for men and women in different classes. The data all come from people in Poland in the period 1887-1914, most of whom were from Warsaw. From height and body mass estimates they can estimate average BMI for each of these groups. (For a quick refresher on BMI, a value under 18.5 is underweight, over 25 is overweight, and over 30 is obese.)
They found that BMIs were rather high; somewhat high for every class but quite high for the middle class and nobility. Peasants and working class people had average BMIs of about 23, while the middle class and nobles had average BMIs of just over 25.
This immediately suggests that more than half of the nobles and middle class were overweight or obese. The authors also estimate the standard deviation for each group, which they use to estimate the percentage of each group that is overweight and obese. The relevant figure for obesity is this:
As you can see, the figure suggests that rates of obesity were rather high. Many groups had rates of obesity around 10%, while about 20% of middle- and upper-class women were obese.
This is pretty striking. One in five Polish landladies and countesses were obese? Are you sure?
To begin with, it contradicts several other sources on what baseline human weight would be during this period. The first is a sample of Union Army veterans examined by the federal government between 1890-1900. The Civil War was several decades before, so these men were in their 40s, 50s, and 60s. This is in almost the exact same period, and this sample of veterans was Caucasian, just like the Polish sample, but the rate of obesity in this group was only about 3%.
Of course, the army veterans were all men, and not a random sample of the population. But we have data from hunter-gatherers of both genders that also suggests the baseline obesity rate should be very low. As just one example, the hunter-gatherers on Kitava live in what might be called a tropical paradise. They have more food than they could ever eat, including potatoes, yams, fruits, seafood, and coconuts, and don’t exercise much more than the average westerner. Their rate of obesity is 0%. It seems weird that Polish peasants, also eating lots of potatoes, and engaged in backbreaking labor, would be so more obese than these hunter-gatherers.
On the other hand, if this is true, it would be huge for our understanding of the history of obesity, so we want to check it out.
Because this seems so weird, we decided to do a few basic sanity checks. For clarity, we refer to the Polish data as reported in the chapter by Budnik & Henneberg as the Warsaw data, since most (though not all) of these data come from Warsaw.
The first sanity check is comparing the obesity rates in the Warsaw data to the obesity rates in modern Poland. Obesity rates have been rising since the 1890s  so people should be more obese now than they were back then.
The Warsaw data suggests that men at the time were somewhere between 0% and 12.9% obese (mean of categories = 7.3%) and women at the time were between 8.8% and 20.9% obese (mean of categories = 16.2%). In comparison, in data from Poland in 1975, 7% of men were obese and 13% of women were obese. This suggests that obesity rates were flat (or perhaps even fell) between 1900 and 1975, which seems counterintuitive, and kinda weird.
In data from Poland in 2016, 24% of men were obese and 22% of women were obese. This also seems weird. It took until 2016 for the average woman in Poland to be as obese as a middle-class Polish woman from 1900? This seems like a contradiction, and since the more recent data is probably more accurate, it may mean that the Warsaw data is incorrect.
There’s another sanity check we can make. Paintings and photographs from the time period in question provide a record of how heavy people were at the time. If the Warsaw data is correct, there should be lots of photographs and paintings of obese Poles from this era. We checked around to see if we could find any, focusing especially on trying to get images of Poles from Warsaw.
We found a few large group photographs and paintings, and some pictures of individuals, and no way are 20% of them obese.
We begin with Sokrates Starynkiewicz, who was president of Warsaw from 1875 to 1892. He looks like a very trim gentleman, and if we look at this photograph of his funeral from 1902, we see that most of the people involved look rather trim as well:
In addition, a photograph of a crowd from 1895:
And here’s a Warsaw street in 1905:
People in these photographs do not look very obese. But most of the people in these photographs are men, and the Warsaw data suggests that rates of obesity for women were more than twice as high.
We decided to look for more photographs of women from the period, and found this list from the Krakow Post of 100 Remarkable Women from Polish History, many of whom seem to have beendecoratedsoldiers (note to self: do not mess with Polish women). We looked through all of the entries for individuals who were adults during the period 1887-1914. There are photographs and/or portraits for many of them, but none of them appear to be obese. Several of them were painters, but none of the subjects of their paintings appear obese either. (Unrelatedly, one of them dated Charlie Chaplin and also married a Count and a Prince.)
If rates of obesity were really 20% for middle and upper class women, then there should be photographic evidence, and we can’t find any. What we have found is evidence that Polish women are as beautiful as they are dangerous, which is to say, extremely.
If we’re skeptical of the Warsaw data, we have to wonder if there’s something that could explain this discrepancy. We can think of three possibilities.
The first is that we have a hard time imagining that whoever collected this data got all these 19th-century Poles to agree to be weighed totally naked. If they were wearing all of their clothes, or any of their clothes, that could explain the whole thing. (It might also explain the large gender and class effects.)
Clothing weighed a lot back then. Just as one example, a lady’s dolman could weigh anywhere between 6 and 12 pounds, and a skirt could weigh another 12 pounds by itself. We found another source that suggested a lady’s entire outfit in the 1880s (though not Poland specifically) would weigh about 25 lbs.
As far as we can tell, there’s no mention of clothes, clothing, garments, shoes, etc. in the chapter, so it’s quite possible they didn’t account for clothing at all. All the original documents seem to be in Polish and we don’t speak Polish, so it’s possible the original authors don’t mention it either. (If you speak Polish and are interested in helping unravel this, let us know!)
Also, how did you even weigh someone in 1890s Poland? Did they carry around a bathroom scale? We found one source that claims the first “bathroom” scale was introduced in 1910, but they must have been using something in 1890.
Sir Francis Galton, who may have come up with the idea of weighing human beings, made some human body weight measurements in 1884 at London’s International Health Exhibition. He invited visitors to fill out a form, walk through his gallery, and have their measurements taken along a number of dimensions, including colour-sense, depth perception, sense of touch, breathing capacity, “swiftness of blow with fist”, strength of their hands, height, arm span, and weight. (Galton really wanted to measure the size of people’s heads as well, but wasn’t able to, because it would have required ladies to remove their bonnets.) In the end, they were given a souvenir including their measurements. To take people’s weights, Galton describes using “a simple commercial balance”.
Galton also specifically says, “Overcoats should be taken off, the weight required being that of ordinary indoor clothing.” This indicates he was weighing people in their everyday clothes (minus only overcoats), which suggests that the Polish data may also include clothing weight. “Stripping,” he elaborates, “was of course inadmissible.”
Also of interest may be Galton’s 1884 paper, The Weights of British Noblemen During the Last Three Generations, which we just discovered. “Messrs. Berry are the heads of an old-established firm of wine and coffee merchants,” he writes, “who keep two huge beam scales in their shop, one for their goods, and the other for the use and amusement of their customers. Upwards of 20,000 persons have been weighed in them since the middle of last century down to the present day, and the results are recorded in well-indexed ledgers. Some of those who had town houses have been weighed year after year during the Parliamentary season for the whole period of their adult lives.”
Naturally these British noblemen were not being weighed in a wine and coffee shop totally naked, and Galton confirms that the measurements should be, “accepted as weighings in ‘ordinary indoor clothing’.” This seems like further evidence that the Warsaw data likely included the weight of individuals’ clothes.
Another explanation has to do with measurements and conversions. Poland didn’t switch to the metric system until after these measurements were made (various sources say 1918, 1919, 1925, etc.), so some sort of conversion from outdated units has to be involved. This chapter does recognize that, and mentions that body mass was “often measured in Russian tsar pounds (1 kg = 2.442 pounds).”
We have a few concerns. First, if it was “often” measured in these units, what was it measured in the rest of the time?
Second, what is a “Russian tsar pound”? We can’t find any other references for this term, or for “tsar pound”, but we think it refers to the Russian funt (фунт). We’ve confirmed that the conversion rate for the Russian funt matches the rate given in the chapter (409.5 g, which comes out to a rate of 2.442 in the opposite direction), which indicates this is probably the unit that they meant.
But we’ve also found sources that say the funt used in Warsaw had a different weight, equivalent to 405.2 g. Another source gives the Polish funt as 405.5 g. In any case, the conversion rate they used may be wrong, and that could also account for some of the discrepancy.
The height measurements might be further evidence of possible conversion issues. The authors remark on being surprised at how tall everyone was — “especially striking is the tallness of noble males” — and this could be the result of another conversion error. Or it could be another side effect of clothing, if they were measured with their shoes on, since men’s shoes at the time tended to have a small heel. (Galton measured height in shoes, then the height of the heel, and subtracted the one from the other, but we don’t know if the Polish anthropometers thought to do this.)
A third possibility is that the authors estimated the standard deviation of BMI incorrectly. To figure out how many people were obese, they needed not only the mean BMI of the groups, they needed an estimate of how much variation there was. They describe their procedure for this estimation very briefly, saying “standard deviations were often calculated from grouped data distributions.” (There’s that vague “often” again.)
What is this technique? We don’t know. To support this they cite Jasicki et al. (1962), which is the book Zarys antropologii (“Outline of Anthropology”). While we see evidence this book exists, we can’t find the original document, and if we could, we wouldn’t be able to read it since we don’t speak Polish. As a result, we’re concerned they may have overestimated how much variation there was in body weights at the time.
These three possibilities seem sufficient to explain the apparently high rates of obesity in the Warsaw data. We think the Warsaw data is probably wrong, and our best guess for obesity rates in the 1890s is still in the range of 3%, rather than 10-20%.
One of the mysterious aspects of obesity is that it is correlated with altitude. People tend to be leaner at high altitudes and fatter near sea level. Colorado is the highest-altitude US state and also the leanest, with an obesity rate of only 22%. In contrast, low-altitude Louisiana has an obesity rate of about 36%. This is pretty well documented in the literature, and isn’t just limited to the United States. We see the same thing in countries around the world, from Spain to Tibet.
A popular explanation for this phenomenon is the idea that hypoxia, or lack of oxygen, leads to weight loss. The story goes that because the atmosphere is thinner at higher altitudes, the body gets less oxygen, and this ends up making people leaner.
This study focused on twenty middle-aged obese German men (mean age 55.7, mean BMI 33.7), all of whom normally lived at a low altitude — 571 ± 29 meters above sea level. Participants were first given a medical exam in Munich, Germany (530 meters above sea level) to establish baseline values for all measures. A week later, all twenty of the obese German men, as well as (presumably) the researchers, traveled to “the air‐conditioned Environmental Research Station Schneefernerhaus (UFS, Zugspitze, Germany)”, a former hotel in the Bavarian Alps (2,650 meters above sea level). The hotel/research station “was effortlessly reached by cogwheel train and cable car during the afternoon of day 6.”
Patients stayed in the Schneefernerhaus research station for a week, where they “ate and drank without restriction, as they would have at home.” Exercise was “restricted to slow walks throughout the station: more vigorous activity was not permitted.” They note that there was slightly less activity at the research station than there was at low altitudes, “probably due to the limited walking space in the high‐altitude research station.” Sounds cozy.
During this week-long period at high altitude, the researchers continued collecting measurements of the participants’ health. After the week was through, everyone returned to Munich (530 meters above sea level). At this point the researchers waited four weeks (it’s not clear why) before conducting the final health examinations, at which point the study concluded. We’re not sure what to say about this study design, except that it’s clear the film adaptation should be directed by Wes Anderson.
While this design is amusing, the results are uninspiring.
To begin with, the weight loss was minimal. During the week they spent at 2,650 meters, patients lost an average of 3 pounds (1.5 kg). They were an average of 232 lbs (105.1 kg) to begin with, so this is only about 1% of their body weight. Going from 232 lbs (105.1 kg) to 229 lbs (103.6 kg) doesn’t seem clinically relevant, or even all that noticeable. The authors, surprisingly, agree: “the absolute amount of weight loss was so small.”
More importantly, we’re not convinced that this tiny weight loss result is real, because the paper suffers from serious multiple comparison problems. Also known as p-hacking or “questionable research practices”, multiple comparisons are a problem because they can make it very likely to get a false positive. If you run one statistical test, there’s a small chance you will get a false positive, but as you run more tests, false positives get more and more likely. If you run enough tests, you are virtually guaranteed to get a false positive, or many false positives. If you try running many different tests, or try running the same test many different ways, and only report the best one, it’s possible to make pure noise look like a strong finding.
We see evidence of multiple comparisons in the paper. They collect a lot of measures and run a lot of tests. The authors report eight measures of obesity alone, as well many other measures of health.
The week the patients spent at 2,650 meters — Day 7 to Day 14 — is clearly the interval of interest here, but they mostly report comparisons of Day 1 to the other days, and they tend to report all three pairs (D1 to D7, D1 to D14, and D1 to D42), which makes for three times the number of comparisons. It’s also confusing that there are no measures for D21, D28, and D35. Did they not collect data those days, or just not report it? We think they just didn’t collect data, but it’s not clear.
The authors also use a very unusual form of statistical analysis — for each test, first they conducted a nonparametric Friedmann procedure. Then, if that showed a significant rank difference, they did a Wilcoxon signed‐rank method test. It’s pretty strange to run one test conditional on another like this, especially for such a simple comparison. It’s also not clear what role the Friedmann procedure is playing in this analysis. Presumably they are referring to the Friedman test (we assume they don’t mean this procedure for biodiesel analysis) and this is a simple typo, but it’s not clear why they want to rank the means. In addition, the Wilcoxon signed‐rank test seems like a slightly strange choice. The more standard analysis here would be the humble paired t-test.
Even if this really were best practice, there’s no way to know that they didn’t start by running paired t-tests, throwing those results out when they found that they were only trending in the right direction. And in fact, we noticed that if we compare body weight at D7 to D14 using a paired t-test, we find a p-value of .0506, instead of the p < .001 they report when comparing D1 to D14 with a Wilcoxon test. We think that this is the more appropriate analysis, and as you can see, it’s not statistically significant.
Regardless, the whole analysis is called into question by the number of tests they ran. By our count they conducted at least 74 tests in this paper, which is a form of p-hacking and makes the results very hard to interpret. It’s also possible that they conducted even more tests that weren’t reported in the paper. This isn’t really their fault — p-hacking wasn’t described until 2011 (and the term itself wasn’t invented until a few years later), so like most people they were almost certainly unfamiliar with issues of multiple comparisons when they did their analysis. While we don’t accuse the authors of acting in bad faith, we do think this seriously undermines our ability to interpret their results. When we ran the single test that we think was most appropriate, we found that it was not significant.
And of course, the sample size was only 20 people, though perhaps there wasn’t room for many more people in the research station. On one hand this is pretty standard for intensive studies like this, but it reduces the statistical power.
The authors claim to show that hypoxia causes weight loss, but this is overstating their case. They report that people brought to 2,650 meters lost a small amount of weight and had lower blood oxygen saturation , but we think the former result is noise and the latter result is unsurprising. Obviously if you bring people to 2,650 meters they will have lower blood oxygen, and there’s no evidence linking that to the reported weight loss.
Even more concerning is the fact that there’s no control group, which means that this study isn’t even an experiment. Without a control group, there can be no random assignment, and with no random assignment, a study isn’t an experiment. As a result, the strong causal claim the authors draw from their results is pretty unsubstantiated.
There isn’t an obvious fix for this problem. A control group that stayed in Munich wouldn’t be appropriate, because oxygen is confounded with everything else about altitude. If there were a difference between the Munich group and the Schneefernerhaus group, there would be no way to tell if that was due to the amount of oxygen or any of the other thousand differences between the two locations. A better approach would be to bring a control group to the same altitude, and give that control group extra oxygen, though that might introduce its own confounds — for example, the supplemental-oxygen group would all be wearing masks and carrying canisters. I guess the best way to do this would be to bring both groups to the Alps, give both of them canisters and masks, but put real oxygen in the canisters for one group and placebo oxygen (nitrogen?) in the canisters for the other groups.
We’re sympathetic to inferring causal relationships from correlational data, but the authors don’t report a correlation between blood oxygen saturation and weight loss, even though that would be the relevant test given the data that they have. Probably they don’t report it because it’s not significant. They do report, “We could not find a significant correlation between oxygen saturation or oxygen partial pressure, and either ghrelin or leptin.” These are tests that we might expect to be significant if hypoxia caused weight loss — which suggests that it does not.
Unfortunately, the authors report no evidence for their mechanism and probably don’t have an effect to explain in the first place. This is too bad — the study asks an interesting question, and the design looks good at first. It’s only on reflection that you see that there are serious problems.
Thanks to Nick Brown for reading a draft of this post.
 One thing that Nick Brown noticed when he read the first draft of this post is that the oxygen saturation percentages reported for D7 and D14 seem to be dangerously low. We’ve all become more familiar with oxygen saturation measures because of COVID, so you may already know that a normal range is 95-100%. Guidelines generally suggest that levels below 90% are dangerous, and should be cause to seek medical attention, so it’s a little surprising that the average for these 20 men was in the mid-80’s during their week at high altitude. We found this confusing so we looked into it, and it turns out that this is probably not a issue. Not only are lower oxygen saturation levels normal at higher altitudes, the levels can apparently be very low by sea-level standards without becoming dangerous. For example, in this study of residents of El Alto in Bolivia (an elevation of 4018 m), the mean oxygen saturation percentages were in the range of 85-88%. So while this is definitely striking, it’s probably not anything to worry about.
Briefly, Hall et al. (2019) is a metabolic ward study on the effects of “ultra-processed” foods on energy intake and weight gain. The participants were 20 adults, an average of 31.2 years old. They had a mean BMI of 27, so on average participants were slightly overweight, but not obese.
Participants were admitted to the metabolic ward and randomly assigned to one of two conditions. They either ate an ultra-processed diet for two weeks, immediately followed by an unprocessed diet for two weeks — or they ate an unprocessed diet for two weeks, immediately followed by an ultra-processed diet for two weeks. The study was ad libitum, so whether they were eating an unprocessed or an ultra-processed diet, participants were always allowed to eat as much as they wanted — in the words of the authors, “subjects were instructed to consume as much or as little as desired.”
The authors found that people ate more on the ultra-processed diet and gained a small amount of weight, compared to the unprocessed diet, where they ate less and lost a small amount of weight.
We’re not in the habit of re-analyzing published papers, but we decided to take a closer look at this study because a couple of things in the abstract struck us as surprising. Weight change is one main outcome of interest for this study, and several unusual things about this measure stand out immediately. First, the two groups report the same amount of change in body weight, the only difference being that one group gained weight and the other group lost it. In the ultra-processed diet group, people gained 0.9 ± 0.3 kg (p = 0.009), and in the unprocessed diet group, people lost 0.9 ± 0.3 kg (p = 0.007). (Those ± values are standard errors of the mean.) It’s pretty unlikely for the means of both groups to be identical, and it’s very unlikely that both the means and the standard errors would be identical.
It’s not impossible for these numbers to be the same (and in fact, they are not precisely equal in the raw data, though they are still pretty close), especially given that they’re rounded to one decimal place. But it is weird. We ran some simple simulations which suggest that this should only happen about 5% of the time — but this is assuming that the means and SDs of the two groups are both identical in the population, which itself is very unlikely.
Another test of interest reported in the abstract also seemed odd. They report that weight changes were highly correlated with energy intake (r = 0.8, p < 0.0001). This correlation coefficient struck us as surprising, because it’s pretty huge. There are very few measures that are correlated with one another at 0.8 — these are the types of correlations we tend to see between identical twins, or repeated measurements of the same person. As an example, in identical twins, BMI is correlated at about r = 0.8, and height at about r = 0.9.
We know that these points are pretty ticky-tacky stuff. By themselves, they’re not much, but they bothered us. Something already seemed weird, and we hadn’t even gotten past the abstract.
To conduct this analysis, we teamed up with Nick Brown, with additional help from James Heathers. We focused on one particular dependent variable of this study, weight change, while Nick took a broader look at several elements of the paper.
Because we were most interested in weight change, we decided to begin by taking a close look at the file “deltabw”. In mathematics, delta usually means “change” or “the change in”, and “bw” here stands for “body weight”, so this title indicates that the file contains data for the change in participants’ body weights. On the OSF this is in the form of a SAS .sas7bdat file, but we converted it to a .csv file, which is a little easier to work with.
Here’s a screenshot of what the deltabw file looks like:
In this spreadsheet, each row tells us about the weight for one participant on one day of the 4-week-long study. These daily body weight measurements were performed at 6am each morning, so we have one row for every day.
Let’s also orient you to the columns. “StudyID” is the ID for each participant. Here we can see that in this screenshot we are looking just at participant ADL001, or participant 01 for short. The “Period” variable tells us whether the participant was eating an ultra-processed (PROC) or an unprocessed (UNPROC) diet on that day. Here we can see that participant 01 was part of the group who had an unprocessed diet for the first two weeks, before switching to the ultra-processed diet for the last two weeks. “Day” tells us which day in the 28-day study the measurement is from. Here we show only the first 20 days for participant 01.
“BW” is the main variable of interest, as it is the participant’s measured weight, in kilograms, for that day of the study. “DayInPeriod” tells us which day they are on for that particular diet. Each participant goes 14 days on one diet then begins day 1 on the other diet. “BaseBW” is just their weight for day 1 on that period. Participant 01 was 94.87 kg on day one of the unprocessed diet, so this column holds that value as long as they’re on that diet. “DeltaBW” is the difference between their weight on that day and the weight they were at the beginning of that period. For example, participant 01 weighed 94.87 kg on day one and 94.07 kg on day nine, so the DeltaBW value for day nine is -0.80.
Finally, “DeltaDaily” is a variable that we added, which is just a simple calculation of how much the participant’s weight changed each day. If someone weighed 82.85 kg yesterday and they weigh 82.95 kg today, the DeltaDaily would be 0.10, because they gained 0.10 kg in the last 24 hours.
To begin with, we were able to replicate the authors’ main findings. When we don’t round to one decimal place, we see that participants on the ultra-processed diet gained an average of 0.9380 (± 0.3219) kg, and participants on the unprocessed diet lost an average of 0.9085 (± 0.3006) kg. That’s only a difference of 0.0295 kg in absolute values in the means, and 0.0213 kg for the standard errors, which we still find quite surprising. Note that this is different from the concern about standard errors raised by Drs. Mackerras and Blizzard. Many of the standard errors in this paper come from GLM analysis, which assumes homogeneity of variances and often leads to identical standard errors. But these are independently calculated standard errors of the mean for each condition, so it is still somewhat surprising that they are so similar (though not identical).
On average these participants gained and lost impressive, but not shocking amounts of weight. A few of the participants, however, saw weight loss that was very concerning. One woman lost 4.3 kg in 14 days which, to quote Nick Brown, “is what I would expect if she had dysentery” (evocative though perhaps a little excessive). In fact, according to the data, she lost 2.39 kg in the first five days alone. We also notice that this patient was only 67.12 kg (about 148 lbs) to begin with, so such a huge loss is proportionally even more concerning. This is the most extreme case, of course, but not the only case of such intense weight change over such a short period.
The article tells us that participants were weighed on a Welch Allyn Scale-Tronix 5702 scale, which has a resolution of 0.1 lb or 100 grams (0.1 kg). This means it should only display data to one decimal place. Here’s the manufacturer’s specification sheet for that model. But participant weights in the file deltabw are all reported to two decimal places; that is, with a precision of 0.01 kg, as you can clearly see from the screenshot above. Of the 560 weight readings in the data file, only 55 end in zero. It is not clear how this is possible, since the scale apparently doesn’t display this much precision.
To confirm this, we wrote to Welch Allyn’s customer support department, who confirmed that the model 5702 has 0.1 kg resolution.
We also considered the possibility that the researchers measured people’s weight in pounds and then converted to kilograms, in order to use the scale’s better precision of 0.1 pounds (45.4 grams) rather than 100 grams. However, in this case, one would expect to see that all of the changes in weight were multiples of (approximately) 0.045 kg, which is not what we observe.
As we look closer at the numbers, things get even more confusing.
As we noted, Hall et al. report participant weight to two decimal places in kilograms for every participant on every day. Kilograms to two decimal places should be pretty sensitive, but there are many cases where the exact same weight appears two or even three times in a row. For example, participant 21 is listed as having a weight of exactly 59.32 kg on days 12, 13, and 14, participant 13 is listed as having a weight of exactly 96.43 kg on days 10, 11, and 12, and participant 06 is listed as having a weight of exactly 49.54 kg on days 23, 24, and 25.
Having the same weight for two or even three days in a row may not seem that strange, but it is very remarkable when the measurement is in kilograms precise to two decimal places. After all, 0.01 kg (10 grams) is not very much weight at all. A standard egg weighs about 0.05 kg (50 grams). A shot of liquor is a little less, usually a bit more than 0.03 kg (30 grams). A tablespoon of water is about 0.015 kg (15 grams). This suggests that people’s weights are varying by less than the weight of a tablespoon of water over the course of entire days, and sometimes over multiple days. This uncanny precision seems even more unusual when we note that body weight measurements were taken at 6 am every morning “after the first void”, which suggests that participants’ bodily functions were precise to 0.01 kg on certain days as well.
The case of participant 06 is particularly confusing, as 49.54 kg is exactly one kilogram less, to two decimal places, than the baseline for this participant’s weight when they started, 50.54 kg. Furthermore, in the “unprocessed” period, participant 06 only ever seems to lose or gain weight in full increments of 0.10 kilograms.
We see similar patterns in the data from other participants. Let’s take a look at the DeltaDaily variable. As a reminder, this variable is just the difference between a person’s weight on one day and the day before. These are nothing more than daily changes in weight.
Because these numbers are calculated from the difference between two weight measurements, both of which are reported to two decimal places of accuracy, these numbers should have two places of accuracy as well. But surprisingly, we see that many of these weight changes are in full increments of 0.10.
Take a look at the histograms below. The top histogram is the distribution of weight changes by day. For example, a person might gain 0.10 kg between days 15 and 16, and that would be one of the observations in this histogram.
You’ll see that these data have an extremely unnatural hair-comb pattern of spikes, with only a few observations in between. This is because the vast majority (~71%) of the weight changes are in exact multiples of 0.10, despite the fact that weights and weight changes are reported to two decimal places. That is to say, participants’ weights usually changed in increments like 0.20 kg, -0.10 kg, or 0.40 kg, and almost never in increments like -0.03 kg, 0.12 kg, or 0.28 kg.
For comparison, on the bottom is a sample from a simulated normal distribution with identical n, mean, and standard deviation. You’ll see that there is no hair-comb pattern for these data.
As we mentioned earlier, there are several cases where a participant stays at the exact same weight for two or three days in a row. The distribution we see here is the cause. As you can see, the most common daily change is exactly zero. Now, it’s certainly possible to imagine why some values might end up being zero in a study like this. There might be a technical incident with the scale, a clerical error, or a mistake when recording handwritten data on the computer. A lazy lab assistant might lose their notes, resulting in the previous day’s value being used as the reasonable best estimate. But since a change of exactly zero is the modal response, a full 9% of all measurements, it’s hard to imagine that these are all omissions or technical errors.
In addition, there’s something very strange going on with the trailing digits:
On the top here we have the distribution of digits in the 0.1 place. For example, a measurement of 0.29 kg would appear as a 2 here. This follows about the distribution we would expect, though there are a few more 1’s and fewer 0’s than usual.
The bottom histogram is where things get weird. Here we have the distribution of digits in the 0.01 place. For example, a measurement of 0.29 kg would appear as a 9 here. As you can see, 382/540 of these observations have a 0 in their 0.01’s place — this is the same as that figure of 71% of measured changes being in full increments of 0.10 kg that we mentioned earlier.
The rest of the distribution is also very strange. When the trailing digit is not a zero, it is almost certainly a 1 or a 9, possibly a 2 or an 8, and almost never anything else. Of 540 observed weight changes, only 3 have a trailing digit of 5.
We can see that this is not what we would expect from (simulated) normally distributed data:
It’s also not what we would expect to see if they were measuring to one decimal place most of the time (~70%), but to two decimal places on occasion (~30%). As we’ve already mentioned, this doesn’t make sense from a methodological standpoint, because all daily weights are to two decimal places. But even it somehow were a measurement accuracy issue, we would expect an equal distribution across all the other digits besides zero, like this:
This is certainly not what we see in the reported data. The fact that 1 and 9 are the most likely trailing digit after 0, and that 2 and 8 are most likely after that, is especially strange.
When we first started looking into this paper, we approached Retraction Watch, who said they considered it a potential story. After completing the analyses above, we shared an early version of this post with Retraction Watch, and with our permission they approached the authors for comment. The authors were kind enough to offer feedback on what we had found, and when we examined their explanation, we found that it clarified a number of our points of confusion.
The first thing they shared with us was this erratum from October 2020, which we hadn’t seen before. The erratum reports that they noticed an error in the documented diet order of one participant. This is an important note but doesn’t affect the analyses we present here, which have very little to do with diet conditions.
Kevin Hall, the first author on this paper, also shared a clarification on how body weights were calculated:
I think I just discovered the likely explanation about the distribution of high-precision digits in the body weight measurements that are the main subject of one of the blogs. It’s kind of illustrative of how difficult it is to fully report experimental methods! It turns out that the body weight measurements were recorded to the 0.1 kg according to the scale precision. However, we subtracted the weight of the subject’s pajamas that were measured using a more precise balance at a single time point. We repeated subtracting the mass of the pajamas on all occasions when the subject wore those pajamas. See the example excerpted below from the original form from one subject who wore the same pajamas (PJs) for three days and then switched to a new set. Obviously, the repeating high precision digits are due to the constant PJs! 😉
This matches what is reported in the paper, where they state, “Subjects wore hospital-issued top and bottom pajamas which were pre-weighed and deducted from scale weight.”
Kevin also included the following image, which shows part of how the data was recorded for one participant:
If we understand this correctly, the first time a participant wore a set of pajamas, the pajamas were weighed to three decimals of precision. Then, that measurement was subtracted from the participant’s weight on the scale (“Patient Weight”) on every consecutive morning, to calculate the participant’s body weight. For an unclear reason, this was recorded to two decimals of precision, rather than the one decimal of precision given by the scale, or the three decimals of precision given by the PJ weights. When the participant switched to a new set of pajamas, the new set was weighed to three decimals of precision, and that number was used to calculate participant body weight until they switched to yet another new set of pajamas, etc.
We assume that the measurement for the pajamas is given in kilograms, even though they write “g” and “gm” (“qm”?) in the column. I wish my undergraduate lab TAs were as forgiving as the editors at Cell Metabolism.
This method does account for the fact that participant body weights were reported to two decimal places of precision, despite the fact that the scale only measures weight to one decimal place of precision. Even so, there were a couple of things that we still found confusing.
The variable that interests us the most is the DeltaDaily variable. We can easily calculate that variable for the provided example, like so:
We can see that whenever a participant doesn’t change their pajamas on consecutive days, there’s a trailing zero. In this way, the pajamas can account for the fact that 71% of the time, the trailing digits in the DeltaDaily variable were zeros.
We also see that whenever the trailing digit is not zero, that lets us identify when a participant has changed their pajamas. Note of course that about ten percent of the time, a change in pajamas will also lead to a trailing digit of zero. So every trailing digit that isn’t zero is a pajama change, though a small number of the zeros will also be “hidden” pajama changes.
In any case, we can use this to make inferences about how often participants change their pajamas, which we find rather confusing. Participants often change their pajamas every day for multiple days in a row, or go long stretches without apparently changing their pajamas at all, and sometimes these are the same participants. It’s possible that these long stretches without any apparent change of pajamas are the result of the “hidden” changes we mentioned, because about 10% of the time changes would happen without the trailing digit changing, but it’s still surprising.
For example, participant 05 changes their pajamas on day 2, day 5, and day 10, and then apparently doesn’t change their pajamas again until day 28, going more than two weeks without a change in PJs. Participant 20, in contrast, changes pajamas at least 16 times over 28 days, including every day for the last four days of the study. The record for this, however, has to go to participant 03, who at one point appears to have switched pajamas every day for at least seven days in a row. Participant 03 then goes eight days in a row without changing pajamas before switching pajamas every day for three days in a row.
Participant 08 (the participant from the image above) seems to change their pajamas only twice during the entire 28-day study, once on day 4 and again on day 28. Certainly this is possible, but it doesn’t look like the pajama-wearing habits we would expect. It’s true that some people probably want to change their pajamas more than others, but this doesn’t seem like it can be entirely attributed to personality, as some people don’t change pajamas at all for a long time, and then start to change them nearly every day, or vice-versa.
We were also unclear on whether the pajamas adjustment could account for the most confusing pattern we saw in the data for this article, the distribution of digits in the .01 place for the DeltaDaily variable:
The pajamas method can explain why there are so many zeros — any day a participant didn’t change their pajamas, there would be a zero, and it’s conceivable that participants only changed their pajamas on 30% of the days they were in the study.
We weren’t sure if the pajamas method could explain the distribution of the other digits. For the trailing digits that aren’t zero, 42% of them are 1’s, 27% of them are 9’s, 9% of them are 2’s, 8% of them are 8’s, and the remaining digits account for only about 3% each. This seems very strange.
You’ll recall that the DeltaDaily values record the changes in participant weights between consecutive days. Because the weight of the scale is only precise to 0.1 kg, the data in the 0.01 place records information about the difference between two different pairs of pajamas. For illustration, in the example Kevin Hall provided, the participant switched between a pair of pajamas weighing 0.418 kg and a pair weighing 0.376 kg. These are different by 0.042 kg, so when they rounded it to two digits, the difference we see in the DeltaDaily has a trailing digit of 4.
We wanted to know if the pajama adjustment could explain why the difference (for the digit in the 0.01’s place) between the weights of two pairs of pajamas are 14x more likely to be a 1 than a 6, or 9x more likely to be a 9 than a 3.
Verbal arguments quickly got very confusing, so we decided to run some simulations. We simulated 20 participants, for 28 days each, just like the actual study. On day one, simulated participants were assigned a starting weight, which was a random integer between 40 and 100. Every day, their weight changed by an amount between -1.5 and 1.5 by increments of 0.1 (-1.5, -1.4, -1.3 … 1.4, 1.5), with each increment having an equal chance of occuring.
The important part of the simulation were the pajamas, of course. Participants were assigned a pajama weight on day 1, and each day they had a 35% chance of changing pajamas, and being assigned a new pajama weight. The real question was how to generate a reasonable distribution of pajama weights. We didn’t have much to go off of, just the two values in the image that Kevin Hall shared with us. But we decided to give it a shot with just that information. Weights of 418 g and 376 g have a mean of just under 400 g and a standard deviation of 30 g, so we decided to sample our pajama weights from a normal distribution with those parameters.
When we ran this simulation, we found a distribution of digits in the 0.01 place that didn’t show the same saddle-shaped distribution as in the data from the paper:
We decided to run some additional simulations, just to be sure. To our surprise, when the SD of the pajamas is smaller, in the range of 10-20 g, you can sometimes get saddle-shaped distributions just like the ones we saw in data from the paper. Here’s an example of what the digits can look like when the SD of the pajamas is 15 g:
It’s hard for us to say whether a standard deviation of 15 g or of 30 g is more realistic for hospital pajamas, but it’s clear that under certain circumstances, pajama adjustments can create this kind of distribution (we propose calling it the “pajama distribution”).
While we find this distribution surprising, we conclude that it is possible given what we know about these data and how the weights were calculated.
When we took a close look at these data, we originally found a number of patterns that we were unable to explain. Having communicated with the authors, we now think that while there are some strange choices in their analysis, most of these patterns can be explained when we take into account the fact that pajama weights were deducted from scale weights, and the two weights had different levels of precision.
While these patterns can be explained by the pajama adjustment described by Kevin Hall, there are some important lessons here. The first, as Kevin notes in his comment, is that it can be very difficult to fully record one’s methods. It would have been better to include the full history of this variable in the data files, including the pajama weights, instead of recording the weights and performing the relevant comparisons by hand.
The second is a lesson about combining data of different levels of precision. The hair-comb pattern that we observed in the distribution of DeltaDaily scores was truly bizarre, and was reason for serious concern. It turns out that this kind of distribution can occur when a measure with one decimal of precision is combined with another measure with three decimals of precision, with the result being rounded to two decimals of precision. In the future researchers should try to avoid combining data in this way to avoid creating such artifacts. While it may not affect their conclusions, it is strange for the authors to claim that someone’s weight changed by (for example) 1.27 kg, when they have no way to measure the change to that level of precision.
There are some more minor points that this explanation does not address, however. We still find it surprising how consistent the weight change was in this study, and how extreme some of the weight changes were. We also remain somewhat confused by how often participants changed (or didn’t change) their pajamas.
This post continues in Part Two over at Nick Brown’s blog, where he covers several other aspects of the study design and data.
Thanks again to Nick Brown for comparing notes with us on this analysis, to James Heathers for helpful comments, and to a couple of early readers who asked to remain anonymous. Special thanks to Kevin Hall and the other authors of the original paper, who have been extremely forthcoming and polite in their correspondence. We look forward to ongoing public discussion of these analyses, as we believe the open exchange of ideas can benefit the scientific community.