The Cybernetics of Alternative Turkey

November 26, 2025 slimemoldtimemoldcybernetics, food, health, mad science, nutrition, science, turkey, vegan, vegetarian5 Comments

When the Tofurky research division is working on new alternative protein products, they tend to worry about taste. They tend to worry about appearance. And they tend to worry about texture.

If they’re making an alternative (i.e. no-animals-were-harmed) turk’y slice, they want to make it look, smell, and taste like the real thing, and they care about proper distribution of fat globules within the alt-slice.

But here’s a hot take, might even be true: people don’t mainly eat food for the appearance. After all, they would still eat most foods in the dark. They don’t mainly eat foods for the texture, the taste, or even for the distribution of fat globules. People eat food for the nutrition.

This is why people don’t eat bowls of sawdust mixed with artificial strawberry flavoring, even though we have invented perfectly good artificial strawberry flavoring. You could eat flavors straight up if you wanted to, but people don’t do that. You want ice cream, not cold dairy flavor #14, and you can tell the difference. This is a revealed preference: people don’t show up for the flavors.

A food has the same taste, smell, texture, retronasal olfaction, and general mouthfeel when you start eating it as when you finish. If you were eating for these features, you would never stop. But people do stop eating — just see how far you can get into a jar of frosting. The first bite may be heavenly, but you won’t get very deep. The gustation features of the frosting — taste, smell, etc. — don’t change. You stop eating because you are satisfied.

Assuming you buy this argument, that the real motivation behind eating food is nutrition, then why do people care about flavor (and appearance, and texture, etc.) at all? We’re so glad you asked:

People can detect some nutrients as soon as they hit the mouth: the obvious one is salt. It’s easy to figure out if a food is high in sodium; you just taste it. As a result, it’s easy to get enough salt. You just eat foods that are obviously salty until you’ve gotten enough.

But other nutrients can’t be detected immediately. If they’re bound up deep within the food and need to be both digested and absorbed, it might take minutes, maybe hours, maybe even longer, before the body registers their presence. To get enough of these nutrients, you need to be able to recognize foods that contain these nutrients, even when you can’t detect them from chewing alone.

This is where food qualities come in. Taste and texture are signs you learn that help you predict what nutrients are coming down the pipeline. Just like how you learn that thud of a candy bar at the bottom of a vending machine predicts incoming sugar. The sight of a halal van predicts greasy food imminently going down your drunk gullet. How you learn that the sight of the Lays bag means that there is something salty inside, even though you can’t detect salt just from looking at it. You also learn that the taste of lentils means that you will have more iron in your system soon, even if you can’t detect the iron from merely putting the lentils in your mouth.

To give context, this is coming from the model of psychology we described in our book, The Mind in the Wheel. In this model, motivation is the result of many different drives, each trying to maintain some kind of homeostasis, and the systems creating the drives are called governors. In eating behavior, different governors track different nutrients and try to make sure you maintain your levels, hit your micros, get enough of each.

There’s still a lot we don’t know about this, but to give one example we’re confident about, there’s probably one governor that makes sure you get enough sodium, which is why you add salt to your food. There’s also at least one governor that keeps track of your fat intake, at least one governor clamoring for sugar, probably a governor for potassium. Who knows.

Governors only care about hitting their goals. Taste and texture are just the signs they use to navigate. And this is where the problem comes in.

Consider that for all its flaws, turkey is really nutritious. Two slices or 84 grams of turkey contains 29% of the Daily Value (DV) for Vitamin B12, 46% of the DV for Selenium, 49% of the DV for Vitamin B6, and 61% of the DV for Niacin (vitamin B3).

Tofurkey is not. As far as we can tell, it doesn’t contain any selenium or B vitamins. Not clear if it contains zinc or phosphorus either. Maybe this is wrong, but at the very least, it doesn’t appear that Tofurkey are trying to nutrition-match. And that may be the key to why these products are still not very popular. If you try to compete with turkey on taste and texture, but people choose foods based on nutrition, you’re gonna have a problem.

This is just one anecdote, but: our favorite alternative protein is Morningstar Farms vegetarian sausage links. And guess what food product contains 25% DV of vitamin B6, 50% DV of niacin, and 130% DV of vitamin B12 per two links? Outstanding in its field.

In the Vegan War Room

We believe this has strategic implications. So please put on your five-star vegan general hat, as we lead you into your new imagined role as commander of the faithful.

General, as you may be aware, the main way our culture attempts to change behavior is by introducing conflict. We attempt to make people skinny by mocking them, which pits the shame governor against the hunger governors. We control children by keeping them inside at recess or making them stay after class, which pits the governors that make them act up in class against the governors that make them want to run around with their friends. Or we control them by saying, no dessert until you eat your brussel sprouts.

This is an unfortunate holdover from the behaviorists, who once dominated the study of psychology. In behaviorism, you get more of what you reward, and less of what you punish. Naturally when they asked themselves “how to get less of a behavior?” the answer they came up with was “punish!” But this is a fundamentally incomplete picture of psychology. Reward and punishment don’t really exist — motivation is all about governors learning what will increase or decrease their errors. While you can decide to pit governors against each other, this approach has serious limitations. It just doesn’t work all that well.

First of all, conflict between governors is experienced as anxiety. So while you can change someone’s behaviour by causing conflict, you’ll also make them seriously anxious. This is fine, we guess, if you hate them and want them to feel terrible all the time. But it’s more than a little antisocial.

Anyone who’s the target of punishment will see what is happening. They don’t want to feel anxious all the time, and they especially don’t want to feel anxious about doing what to them are normal, everyday things. If you try to change their behavior in this way, they will find you annoying and do their best to avoid you, so you can’t create so much conflict inside them. Imagine how much less effective this strategy is, compared to finding a method of convincing that people don’t avoid, or that they might even actively seek out.

On top of this, conflict dies out without constant maintenance. In the short term you can convince people that they will be judged if they have premarital sex, but this lesson will quickly fade, especially if they see people getting busy without consequence. The only way to keep this in check is to run a constant humiliation campaign, where people are reminded that they will be shamed if they ever step out of line. This is expensive, neverending, and, for the obvious reasons, unpopular. Scolding can work in limited ways, but nobody likes a scold.

Many attempts to convince people to become vegan, or even to simply eat less meat, follow this strategy — they try to make people eat less meat by taking the governors that normally vote for meat-eating (several nutritional governors, and perhaps some other governors, like the one for status) and opposing them with some other drive.

You can tell people that they are bad people for eating meat, you can say that they will be judged, shamed, or ostracized. You can tell them that eating meat is bad for their health or bad for the environment. This might even be true. But just because it’s true doesn’t mean it’s motivating. This strategy won’t work all that well. It only causes conflict, because the drives that vote against eating meat will be strenuously opposed by the drives that have always been voting to eat meat to begin with.

But you don’t need to fight your drives. Better to provide a substitute.

No one takes a horse to their dentist appointments anymore. Cars are just vegan carriages; hence “horseless carriage”. We used to kill whales for oil. We don’t do that anymore, and it’s not because people became more compassionate. It’s because whale oil lamps got beat out by better alternatives, like electric lighting. People substitute one good for another when it is either strictly better at satisfying the same need(s), or better in some way — for example, not as good, but much cheaper, or much faster, or much more convenient.

Whale oil lamps burned bright, but with a disagreeable fishy smell. Imagine if in the early days of alternative lighting, they had tried to give whale oil substitutes like kerosene or electric lights the same fishy smell, imagining that this would make it easier to compete with whale oil. No! They just tried to address the need the whale oil was addressing, namely light, without trying to capture any of the incidental features of whale oil. They offered a superior product, or sometimes one that was inferior but cheaper, and that was enough to do the job. We don’t run whale ships off Nantucket any more.

So if you want people to eat less meat, if you want more people to become vegan, you shouldn’t roll out alternative turkey, salami, or anything else. You should provide substitutes, competing superior products, that satisfy the same drives without any reference to the original product. Ta-daaaa.

No one eats yogurt because they have an innate disposition for yogurt. Instead, they eat it because yogurt fulfills some of their needs. If they could get those needs met through a different product, they probably would, especially if the alternative is faster / easier / cheaper.

For the sake of illustration, let’s say that turkey contains just three nutrients, vitamins X, Y, and Z.

If you make an alternative turkey that matches the real thing in taste and texture, but provides none of the same nutrients, then despite the superficial similarity, you’re not even competing in the same product category. It’s like selling cardboard boxes that look like cars but that can’t actually get you to work — however impressive they might look, they don’t meet the need. People will not be inclined to replace their real turkey with your alternative one, at least not without considerable outside motivation. You will be working uphill.

Making a really close match can actually be counterproductive. If an alternative food looks/tastes/smells very similar to an original food, but it doesn’t contain the same nutrition, this is basically the same as gaslighting your governors. And the better the taste match, the more confusing this is.

Think about it from the perspective of the selenium governor. You’re trying to encourage behaviors that keep you in the green zone on your selenium levels, mostly by predicting which foods will lead to more selenium later. But things have recently become really confusing. About half the time you taste turkey flavor and texture, you get more selenium a few hours later. The other half of the time, you encounter turkey flavor and texture, but the selenium never arrives.

By eating alternative proteins that taste like the “real thing”, you end up seriously confusing your governors, with basically no benefit.

We recently tried one of these new vegan boxed eggs. It did have the appearance of scrambled eggs, and it curdled much like scrambled eggs. It even tasted somewhat like scrambled eggs. But the experience of eating it was overall terrible. Not the flavor — the deep sense that this was not truly filling, not a food product. Despite simulating the experience of eggs quite closely, we did not want it. Maybe because it was not truly nutritious.

If you make an alternative turkey that contains vitamins X, Y, and Z, you will at least be providing a real substitute. People will have a natural motivation to eat your alternative turkey. But if you do this, you’re still in direct competition with the original turkey. You’re in its niche, it is an away game for you and a home game for turkey. You have to convince the consumer’s mind that your alt-turkey is worth switching to, and that takes a lot of convincing. People prefer the familiar. Unless the new product is much better in some way, they won’t switch.

If you are trying to replicate turkey, you need to make a matching blob that matches real turkey on all the dimensions people might care about. A product exactly like that is hard to make at all, and forget about doing it while also being cheap, available, and satisfying. This is why it’s an uphill battle, you’re trying to meet turkey exactly.

Those of us who have never tasted tukrey are in ignorance still, our subconscious has no idea that turkey slices would be a great source of vitamin X. We’re not tempted. But people who have tried turkey before have tasted the deli meat of knowledge, and there’s no losing that information once you have it. Vitamin X governor gets what vitamin X governor wants, so these people will always feel called to the best source of vitamin X they’re aware of. You’ll never convince the vitamin X governor that turkey is a bad source of vitamin X; you’ll get more mileage out of giving it a better way to get what it wants!

So instead of shaming, or offering mock meats, the winning strategy might be to just come up with new, original vegan foods that are very good sources of vitamins X, Y, and/or Z. Just make vitamin X drinks, vitamin Y candies, and vitamin Z spread. If you don’t try to mimic turkey, then you’re not in competition with turkey in any way. You don’t need to convince people that it’s better than turkey — you just need to convince them that it’s nutritious and delicious. Why try to copy turkey when you can beat it at its own game?

You don’t need alt-turkey to be all turkey things to all turkey people. As long as people get their needs covered in a way that satisfies, they’ll be happy.

It seems like it would be easier to make a good source of phosphorus, than to make a good source of phosphorus PLUS make it resemble yogurt as much as possible. Alternative proteins that try to mimic existing foods will always be at a disadvantage in terms of quality, taste, and cost, simply because trying to do two things is harder than doing one thing really well. You’ll lose out on a lot of tradeoffs.

If we created new food products that contain all the nutrients that people currently get from meat, except tastier, cheaper, or even just more convenient, people would slowly add these foods to their diet. Over time, these foods would displace turkey and other meats as superior substitutes, just like electric lights replaced gas lamps, or like cell phones eclipsed the telegraph. Without even thinking about it, people will soon be eating much less meat than they did before. And if these new foods are good enough sources of the nutrients we need, then in a generation or two people may not be eating meat at all. After all, meat is a bit of a hassle to produce and to cook. Not like my darling selenium drink.

We see this already in some natural examples. Tofu is much more popular in countries like China, Korea, Japan, where it is simply seen as a food, than it is in the US, where it is treated as a meat substitute. You don’t frame your substitute as being in the same category as your competitors unless you really have to. That’s just basic marketing.

We have a friend whose family is from Cuba. She tells a story about how her grandmother was bemused when avocado toast got really popular in the 2010s. When asked why she found this so strange, her grandmother explained that back in Cuba, the only reason you would put avocado on your toast was if you were so dirt poor you couldn’t afford butter. It was an extremely shameful thing to have to put avocado on your toast, avocados grew on trees in the back yard and were basically free. If you were so very poor as to end up in this situation, you would at least try to hide it.

In Cuba, where avocado was seen as a substitute for butter, it was automatically seen as inferior. But when it appeared in 2010s America in the context of a totally new dish, it was wildly popular. And in terms of food replacement, avocado is a stealth vegan smash hit, way more successful than nearly any other plant-based product. It wasn’t framed that way, but in a practical sense, what did avocado displace? Mostly dairy- and egg-based spreads like butter, cream cheese, and mayonnaise. There may be no other food that has led to such an intense increase in the effective amount of veganism, even if the people switching away from these spreads didn’t see it that way. They just wanted avocado on the merits.

This product space is usually thought of as “alternative proteins”. Which is fine, protein is one thing that everyone needs. But a better perspective might be, “vegan ways to get where you’re going”. And just because some of these targets happen to be bundled together in old-fashioned flesh-and-blood meat, doesn’t mean they need to be bundled together in the same ways in the foods of the future.

How to DIY New Scientific Protocols

November 17, 2025 slimemoldtimemolddesign, DIY, internet science, mad science, research methods, science3 Comments

Scientific research today relies on one main protocol — experiments with control groups and random assignment. In medical contexts, these are usually called randomized controlled trials, or RCTs.

The RCT is a powerful invention for detecting population-level differences across treatments or conditions. If there’s a treatment and you want to know if it’s more effective than control or placebo, if you want to get an answer that’s totally dead to rights, the RCT is hard to beat. But there are some problems with RCTs that tend to get swept under the rug.

Today we aim to unsweep.

First, RCTs are seen as essential to science, but in fact they are historically unusual. RCTs were first invented in 1948, so most of science happened before they were even around. Galileo didn’t use RCTs, neither did Hooke, Lavoisier, Darwin, Kelvin, Maxwell, or Einstein. Newton didn’t use RCTs to come up with calculus or his laws of motion. He used observations and a mathematical model. So the idea that RCTs and other experiments are essential to science is ahistorical and totally wrong.

If you were to ask doctors what findings they are most sure of, they would almost certainly include “smoking causes cancer” in their list. But we didn’t discover this connection by randomly assigning some people to smoke a pack a day and other people to abstain, over the course of several years. No. We used epidemiologic evidence to infer a causal relationship between the presumed cause and observed effect.

Second, the RCT is only one tool, and like all tools, it has specific limitations. It’s great for studying population-level differences, or treatments where everyone has a similar response. But where there is substantial heterogeneity of treatment, the RCT is a poor tool and often gives incoherent answers. And if heterogeneity is the main question of interest, it’s borderline useless.

Put simply, if people respond to a treatment in very different ways, an RCT will give results that are confusing instead of clarifying. If some people have a strong positive response to treatment and some people have no response at all, the RCT will distill this into the conclusion that there is a mild positive response to treatment, even if no individual participant has a mild positive response!

Also, RCTs are like, way inefficient. To test for a moderate effect size, you need several dozen or several hundred participants, and you can test only one hypothesis at a time. Each time you compare condition A to condition B, you find out which group does better. Maybe you want to see if a dose of 2 mg is better than a dose of 4 mg. But if there are a dozen factors that might make a difference, you need a dozen studies. If you want to test two hypotheses, you need two groups several dozen or several hundred participants, for three you will need at least three groups, et cetera.

Third, RCTs don’t take advantage of modern cheap computation and search algorithms. For example, in the 1980s there was some interest in N=1 experiments for patients with rare cancers. This was difficult in the 1980s because of limited access to computers, even at research universities. But today you could run the same program on your cell phone a hundred times over. We’d be better off making use of these new insights and capabilities.

Recent Developments

Statistics is young, barely two hundred years at the outside. And the most familiar parts are some of the youngest. Correlation was invented in the 1880s and refined in the 1890s. It’s not even as old as trains.

Turns out it is kinda easy to make new tools. The RCT is important, but it isn’t rocket science. A new century requires new scientific protocols. The 21st century is an era where communication is prolific and computation is cheap, and we should harness this power.

Since the early days, science has been based on doing experiments and sharing results. Researchers collect data, develop theories, and discuss them with other likeminded weirdos, freaks, and nerds.

New technology has made it easier to do experiments and share results. And by “new technology”, we of course mean the internet. Just imagine trying to share results without email, make your data and materials public without the OSF or Google Drive or Dropbox, or collaborate on a manuscript by mailing a stack of papers across the country. Seriously, we used to live like that. Everyone did.

People do like the internet, and we also hear that they sometimes use it. Presumably a sensible, moderate amount. But just like the printing press, which was invented in 1440 but didn’t lead to the Protestant Reformation until 1517, the internet (and related tech like the computer and pocket computer, or “call phone”) has not yet been fully leveraged.

Let’s Put on our Thinking Caps

This is all easy enough to say, but at some point you need to consider how to come up with totally new research methods.

We take three main angles, which are historical, analogical, and tinkering. Basically: Look at how people came up with new methods in the past. Look at successful ideas from other fields and try applying them to science. And look at the different ideas and see what happens when you expose them to nature.

We begin with close reads and analysis of the successful development of past protocols (for example, the scientific innovation around the cure for scurvy).

We develop new scientific protocols by analogy to successful protocols in other areas. For example, self-experiments are somewhat like debugging (programmers in the audience will be familiar with suspicion towards stories of “well, it worked on MY setup”). The riff trial was developed in analogy to evolution.

Finally, we deploy simple versions of these protocols as quickly as possible so that we can tinker with them and benefit from the imagination of nature. This is also somewhat by analogy to hacker development methods, and startup concepts like the minimum viable product. We try out new ideas as soon as they are ready, and all of our work is published for free online, so other people can see our ideas and tinker with them too.

Here are some protocols we’ve been dreaming about that show exceptional promise:

N=1

The idea of N = 1 experiments / self-experiments has been around for a while, and there are some famous case studies like Nobel Laureate Barry Marshall’s self-administration of H. Pylori to demonstrate its role in stomach ulcers and stomach cancer. But N = 1 protocols have yet to reach their full potential.

There’s a lot of room to improve this method, especially for individuals with chronic illnesses/conditions that bamboozle the doctors. N = 1 studies have particular considerations, like hidden variables. You can’t just slap on a traditional design, you need to think about things like latency and half-life. And many of the lessons of N = 1 generalize to N of small.

Community Trial

The Community Trial is a protocol that blurs the line between participant and researcher. In these trials, an organizer makes a post providing guidelines and a template for people to share their data. Participants then collect their own data and send it to the organizer, who compiles and analyzes the results, sharing the anonymized data in a public repository.

Data collection is self-driven, so unlike a traditional RCT, participants can choose to measure additional variables, participate in the study for longer than requested, and generally take an active role in the study design.

Unlike most RCTs, community trials allow for rolling signups, and could be developed into a new class of studies that run continuously, with permanently open signups and an ever-growing database of results with a public dashboard for analysis.

We first tested this with the Potato Diet Community Trial (announcement, results), where 209 people enrolled in a study of an all-potato diet, and the 64 people who completed 4 weeks lost an average of 10.6 lbs. Not bad.

Reddit Trials

There’s a possible extension of the community trial that you might call a “Reddit Trial”.

In this protocol, participants in an online community (like a subreddit) that all share a common interest, problem, or question (like a mystery chronic illness) come together and invent hypotheses, design studies, collect data, perform analysis, and share their results. As in a community trial, participants can take an active role in the research, measure additional variables, formulate new hypotheses as they go, etc.

People seem to think that a central authority makes things better, but we think for design and discovery that’s mostly wrong. You want the chaos of the marketplace, not the rigid stones of the cathedral. Every bug is shallow if one of your readers is an entomologist.

This could be more like a community trial, where one person, maybe even a person from outside the community, takes the lead. But it could also be very different from a community trial, if the design and leadership is heavily or enormously distributed. There’s no reason that rival factions within a community, splintering over design and analysis, might not actually make this process better.

We already wrote a bit about similar ideas in Job Posting: Reddit Research Czar. And none other than Patrick Collison has come to a closely-related conclusion in a very long tweet, saying:

Observing some people close to me with chronic health conditions, it’s striking how useful Reddit frequently ends up being. I think a core reason is because trials aren’t run for a lot of things, and Reddit provides a kind of emergent intelligence that sits between that which any single physician can marshal and the full rigor of clinical trials.

… Reddit — in a pretty unstructured way — makes a limited kind of “compounding knowledge” possible. Best practices can be noticed and can imperfectly start to accumulate. For people with chronic health problems, this is a big deal, and I’ve heard lots of stories between “I found something that made my condition much more manageable” all the way to “I found a permanent cure in a weird comment buried deep in a thread”.

… Seeing this paper and the Reddit experience makes me wonder whether the approach could somehow be scaled: is there a kind of observational, self-reported clinical trial that could sit between Reddit and these manual approaches? Should there be a platform that covers all major chronic conditions, administers ongoing surveys, and tracks longitudinal outcomes?

We think the answer is: obviously yes. It’s just up to people to start running these studies and learning from experience. We’re also reminded of Recommendations vs. Guidelines from old Slate Star Codex.

Riff Trials

The Riff Trial takes a treatment or intervention which is already somewhat successful and recruits participants to self-assign to close variations on the original treatment. Each variation is then tested, and the results reported back to the organizers.

This uses the power of parallel search to quickly test possible boundary conditions, and discover variations that might improve upon the original. Since each variation is different, and future signups can make use of successful results, this can generate improvements based on the power of evolution.

We tested this protocol for the first time in the SMTM Potato Diet Riff Trial, with four rounds of results reported (Round 1, Round 2, Round 3, Retrospective).

This has already led to at least one discovery. While we originally thought that consuming dairy would stop the potato diet’s weight loss effects, multiple riff trials demonstrated that people keep losing weight just fine when they have milk, butter, even sour cream with their potatoes. Consuming dairy does not seem to be a boundary condition of the potato diet, as was originally suspected. This also seems to disprove the idea that the standard potato diet works because it is a mono-diet, boring, or low-fat. How can it work from being a mono-diet, boring, or low-fat if it still works when you add various dairy products, delicious dairy products, and high-fat dairy products?

There are hints of other discoveries in this riff trial too, like the fact that the diet kept working for one guy even when he added skittles. But that’s still to be seen.

“Bullet-Biting”

In most studies, people have a problem and want the effect to work. If it’s a weight loss study, they want to lose weight, and don’t want the weight loss to stop. So participants are hesitant to “bite the bullet” and try variations that might stop the effect.

This creates a strong bias against testing which parts of the intervention are actually doing the work, which elements are genuinely necessary or sufficient. It makes it much harder to identify the intervention’s real boundary conditions. So while you may end up with an intervention that works, you will have very little idea of why it works, and you won’t know if there’s a simpler version of the intervention that would work just as well; or maybe better.

We find this concerning, so we have been thinking about a new protocol where testing these boundaries is the centerpiece of the approach. For now we call it a “bullet-biting trial”, in the sense that it guides researchers and participants to bite the bullet (“decide to do something difficult or unpleasant in order to proceed”) of trying things that might kill the effect.

In this protocol, participants first test an intervention over a baseline period, to confirm that the standard intervention works for them.

Then, they are randomized into conditions, each condition being a variation that tests a theoretical or suspected boundary condition for the effect (e.g. “The intervention works, but it wouldn’t work if we did X/didn’t do Y.”).

For example, people might suspect that the potato diet works because it is low fat, low sugar, or low seed oils. In this protocol, participants would first do two weeks of a standard potato diet, to confirm that they are potato diet responders. No reason to study the effect in people who don’t respond! Then, anyone who lost some minimum amount of weight over the baseline period would be randomized into a high-fat, high-sugar, or high-seed-oil variant of the potato diet for at least two weeks more. If any of these really are boundary conditions, and stop the weight loss dead, well, we’d soon find out.

By randomly introducing potential blockers, you can learn more about how robust an intervention truly is. Maybe the intervention you’ve been treating so preciously actually works just fine when you’re very lax about it! More importantly, you can test theories of why the intervention works, since different theories will usually make strong predictions about conditions under which an intervention will stop working. And this design might help us better understand differences between individuals — it may reveal that certain variations are a boundary condition for some people, but not for others.

Corn Holes

October 22, 2025 slimemoldtimemoldallergies, corn, internet science, mad science, nutrition, obesity, science8 Comments

Extreme corn allergies aren’t common, but over the course of our lives we’ve happened to meet two people who have them. “Extreme” means they couldn’t eat corn, couldn’t eat corn products, and couldn’t eat any product containing corn derivatives. One of them was so allergic, she couldn’t even eat apples unless she picked them from the tree herself — apples in the store have been sprayed with wax, and some of those waxes contain corn byproducts.

Both of these people were also extremely lean, we mean like rail thin. It’s easy to imagine alternative explanations for this — if you have to carefully avoid any food that has ever been within shouting distance of corn, it might be harder to get enough to eat. But there’s no rule saying you can’t grow fat on pork and rice, and it occurs to us that if corn were somehow in the causal chain that’s causing the obesity epidemic, this is exactly what you would see.

If corn were a direct cause of the obesity epidemic — maybe if it concentrates an obesogenic contaminant like lithium, maybe if obesity is caused by a pesticide massively applied to corn — then people with serious corn allergies should be almost universally thin, or should at least have an obesity rate much lower than the general population. Our sample size of two is far too small to draw this conclusion right now, but every sample of 100 or 10,000 passes through a sample size of 2 at some point.

Easy enough to test. So, if you or someone you know has a serious corn allergy, are you really lean? We would love to know! Do you have access to the talk.kernelpanic.zero mailing list? Is there a secret r/cornwatchers subreddit? Can we send them a survey?

Corn aside, we can generalize this argument. The obesity rate in the US is about 40%. If people with an allergy to soy, fish, sesame, etc. are less than 40% obese, that implicates the food they’re allergic to. And if their obesity rate is < 5%, that’s a smoking gun.

You could also say, maybe people with food allergies have a lower overall rate of obesity, on account of their food allergies. This is probably true. Let’s say that the general rate of obesity in people with serious food allergies is 25%, instead of the 40% of the general population. But if people with serious avocado, kiwi, and banana allergies are 27%, 23%, and 24% obese, and people with serious tomato allergies are 2% obese, that’s kind of a signal.

There are some complications, like the fact that people with one food allergy are more likely to have another food allergy. But let’s not worry about that until we have the data.

One of our most counterintuitive beliefs is that the obesity epidemic may not have much to do with what we eat. But if it does, there should be some signal in the allergy cohorts.

Lithium Yay

October 2, 2025October 2, 2025 slimemoldtimemoldinternet science, lithium, mad science, obesity, science3 Comments

Scott Alexander recently named five criticisms of A Chemical Hunger, our series on the obesity epidemic, and asked for our responses. These criticisms come by way of a LessWrong commenter named Natália (see post, post).

We appreciate Scott taking the time to identify these as his top five points, because this gives us a concrete list to respond to. In short, we think these criticisms are generally confused and misunderstand our arguments.

Here they are:

1. Do you agree with the obesity increase being gradual over the course of the 20th century, rather than “an abrupt shift” as you describe in ACH?

If we’re talking about obesity rates, those increased abruptly around 1970. The increase was about 10 percentage points in the 60 years before the early 1960s and about 30 percentage points in the 60 years after the early 1960s. We’re all literally quoting the same numbers from the same sources (NHANES), there shouldn’t be any disagreement about whether or not there was an abrupt shift in obesity rates, unless we’re just arguing semantics over what counts as “abrupt”. Of interest in this point is that Natália agrees. She made a changelog to the relevant post where she wrote, “discussion in the comments made me realize that the argument I was trying to make was too semantic in nature and exaggerated the differences in our perspectives.”

Some people think that other measures, like average BMI, might have been increasing more linearly, that the abrupt shift in obesity rates are an artifact of the normal distribution in what is actually a gradual increase, that these other measures are therefore a better indicator, and that this suggests there was no special change in the obesity epidemic around 1970. This would be an interesting wrinkle, but we’ve looked at various models and we don’t think they support this interpretation (see the appendix for details). There’s even some data on average BMI over time, which also seems to show a shift. We still think there’s evidence of a change in the rate of change.

That said, we think this is the wrong question to ask. We highlighted the abrupt shift in obesity rates because we think it’s interesting, and maybe surprising, but it doesn’t do a lot to help us distinguish between different hypotheses, so it’s not very important. Contamination can happen either gradually or abruptly, so unless we’re asking about a specific contaminant that was abruptly introduced in 1970, whether or not the shift was abrupt has little bearing on whether the contamination hypothesis is correct. If anything, a gradual increase starting around 1950 is more compatible with the lithium hypothesis, because there’s some reason to think that lithium exposure increased gradually:

*Graph showing world lithium production from 1900 to 2007, by deposit type and year. The layers of the graph are placed one above the other, forming a cumulative total. Reproduced from* *USGS*.

2. Do you agree that even medical lithium patients don’t have enough weight gain to cause the obesity epidemic? If so, why do you think that getting a tiny fraction of that much lithium would?

This is a great question. Let’s say that on average, people have gained 12 kilos since 1970, but that patients only gain an average of 6 kilos when they start taking medical lithium. This would be some evidence that lithium exposure isn’t responsible for the entire change in obesity since 1970. But it would be quite consistent with the idea that lithium caused some of the change in obesity since 1970, potentially as much as 50%.

We’re comfortable with the idea that lithium may be responsible for only part of the obesity epidemic. Natália even mentions this, she says, “[SMTM] also think that other contaminants could be responsible, either alone or in combination” in footnote 1 of this post. Even if we assume the weight gained by medical lithium patients is an upper limit on the possible effect, it still seems consistent with lithium exposure being responsible for some reasonable percentage of the overall increase. If lithium caused “only” 50% of the weight gain since 1970, or even just 10%, that would still be a pretty big deal and we would still care about that.

That said, we do think there’s some reason to suspect that lithium might be responsible for more than 50%. If everyone is already exposed to lithium in their diet, then the amount of weight gained by medical lithium patients when they add a higher dose will underestimate the total effect. Extremely long-term trace exposure (and bolus doses, compounds other than lithium carbonate, etc.) might have different pharmacokinetics than medical lithium. And there’s at least one population (the Pima of the Gila River Valley) where long-term exposure to lithium in food and water was associated with striking rates of obesity and diabetes, suggesting that under some conditions, lithium levels found in food and water may be enough to cause serious weight gain.

3. Natalia lists several reasons to expect that trace lithium doses should have only trace effects – Gwern’s reanalysis showing few-to-no psych effects, some studies suggesting low doses have fewer side effects, and lack of any of the non-weight-gain side effects of lithium in trace users. What are your thoughts on this?

We think there are several reasons to expect effects from trace and subclinical doses, especially with extremely long-term exposure.

We’re only aware of one RCT of trace-level doses (Schrauzer & de Vroey, 1994), but this study found that taking 0.4 mg per day of lithium orally led to participants feeling happier, more friendly, more kind, less grouchy, etc., “without exception”, compared to placebo.

When we surveyed redditors who took subclinical doses of lithium as a nootropic (ballpark 1-10 mg/day), people commonly reported some non-weight-gain effects, like increased calm, brain fog, frequent urination, and decreased libido. And they rarely or never reported other effects, like eye pain, fainting, or severe trembling. This suggests that low doses of lithium are enough to cause some common effects of lithium, while not causing others.

Following chronic lifelong exposure to trace doses of lithium in their drinking water, and accumulation in some of their food, the Pima of the Gila River Valley ended up with high rates of obesity and diabetes. The Pima became obese and lethargic, but didn’t (as far as we know) suffer from hand tremors or nausea. Their example also supports the idea that lithium has some effects that kick in at psychiatric dose levels and others at groundwater levels, and that metabolic effects might be among the effects that can be caused by food and groundwater exposure alone.

These examples seem to address the concern of “some studies suggesting low doses have fewer side effects, and lack of any of the non-weight-gain side effects of lithium in trace users”. Lower doses do have fewer effects, and some effects do seem to go away as you lower the dose. But other effects seem to be fairly common, even at low doses, and others may manifest with long-term exposure. This question is especially hard to answer in just a few paragraphs, so take a look at the appendix for much more detail.

4. Do you agree that wild animals are not really becoming obese?

This is a misunderstanding about the use of the word “wild”. Our main source for animals becoming obese was Klimentidis et al. (2010), Canaries in the coal mine: a cross-species analysis of the plurality of obesity epidemics, which uses the terms “wild” and “feral” to refer to a sample of several thousand Norway rats.

Following this source, in Part I of A Chemical Hunger we also use the terms “wild” and “feral” to refer to these rats. We say, “Humans aren’t the only ones who are growing more obese — lab animals and even wild animals are becoming more obese as well. Primates and rodents living in research colonies, feral rodents living in our cities, and domestic pets like dogs and cats are all steadily getting fatter and fatter.” Our use of the term followed our source, and while it’s natural that people misunderstood the term to mean something more broad, let’s clarify that we didn’t intend to imply we were making claims about mountain goats, sloths, or white-tailed deer.

But the broader question is definitely interesting, so let’s consider it now: have “truly wild” animals, living totally separately from humans, been getting obese as well? We think this is a point where reasonable people can disagree, because there isn’t much data about the weight of truly wild animals over time. There’s very little to go on. We can point to an example paper, Wolverton, Nagaoka, Densmore, & Fullerton (2008), where we find data that are consistent with the idea that some truly wild animals are getting heavier, so we think it’s possible. But we don’t claim it’s well-supported. The wildest animals we have good data on are probably those feral rats from above.

But we don’t make much of this either way, because it doesn’t seem like a crux. If pets, zoo animals, lab animals, feral animals, and/or truly wild animals are getting obese, that’s some evidence in favor of the contamination hypothesis. But the contamination hypothesis can still be true if some of those populations are not becoming obese.

5. Do you agree that water has higher lithium levels at high altitudes (the opposite of what would be needed for lithium to explain the altitude-obesity correlation)?

No. This claim is based on an analysis that contains several mistakes.

Natália conducted an analysis of this dataset from the USGS and elevation data from Open Elevation API, and found a positive correlation of 0.46 between altitude and log(lithium concentration) in U.S. domestic-supply wells. We replicated this analysis and can confirm that’s the correlation coefficient you get. But this analysis is mistaken, for two main reasons.

First of all, the statistical problem. Correlation tests estimate the population correlation by looking at the correlation in a random sample drawn from that population. But this sample isn’t random, and it’s not representative either. The data mostly come from Nebraska, certain parts of Texas, and the East Coast. Some states are not represented at all. Really, look at the map below; it’s so much Nebraska. Even if there is a correlation within this dataset, there’s no reason to expect it’s a meaningful estimate of the correlation in the U.S. as a whole.

But even if this were a random sample, this analysis would still be mistaken, because it’s a sample from the wrong population. Natália’s analysis only covers domestic-supply wells. It excludes public-supply wells, and it entirely omits surface water sources.

This is a problem, because many people get their drinking water from public-supply wells, or from surface water. And it’s a problem because if there were a correlation between lithium levels and altitude, we’d expect to see it in surface water, not well water. Water drawn from wells has often been down there for thousands of years, while surface water is directly exposed to runoff, landfills, brine spills, power plants, and factory explosion byproducts. So we’d expect surface water to drive any correlation of obesity with altitude.

This is a pretty strange set of errors for Natália to make, given that we discussed this dataset in A Chemical Hunger and specifically warned about both of these issues.

We also want to call attention to a 6th point that Scott doesn’t mention. If we were to phrase it as one of his questions, it might go something like this:

6. You did a literature review of lithium concentrations in food and found that some foods contain more than 1 mg/kg of lithium, which implies that people might be getting subclinical doses from their daily diet. Natália disputes this and says that the best available data shows less than 0.5 mg/kg lithium in every single food. Do you agree?

The truth is that there’s a split in the literature. The studies Natália cites consistently find low levels of lithium in food and beverages, as do some other papers. But other sources find much higher levels. These sources seem to contradict each other, in a way that seems like they can’t all be right. And there are other major gaps in our knowledge; Natália correctly pointed out that there are few recent measurements of lithium in the American food supply.

We went back and took a closer look at the study methods. What we noticed is that the studies that found < 1 mg/kg lithium tended to use the same technique for chemical analysis — ICP-MS with microwave digestion with nitric acid (HNO3). The studies that found more than 1 mg/kg lithium in food used a variety of other techniques.

This made us suspect that the split in the literature was caused by the method of analysis. It seemed like maybe one technique gave really low estimates of lithium in food, while other techniques gave much higher readings. To test this, we ran a study where we took samples of several American foods and analysed the same food samples using different methods.

This confirmed our hypothesis. Different analytical methods gave very different results.

When the foods were digested in HNO3, both ICP-MS and ICP-OES analysis mostly reported that concentrations of lithium were below the limit of detection. When foods were dry ashed instead, both ICP-MS and ICP-OES consistently found levels of lithium above the limit of detection, as high as 15.8 mg/kg lithium in eggs (which we replicated in a second study on just eggs).

This neatly explains the discrepancies in the literature. The lower results come from methods that yield very low estimates, often detecting no lithium at all, and the higher results come from other methods that give higher estimates. We think that the higher results are more accurate for several reasons (see our full reasoning in the original post) but the fastest way to make this case is that they show greater discrimination (better at distinguishing between samples). But even the lower estimates still support the idea that American foods sometimes contain more than 1 mg/kg, as they detected up to 1.2 mg/kg lithium in goji berries.

For more detail on all these points, see the Appendix. But first:

Why didn’t we respond earlier?

We love scientific debate. That’s why we respond to questions on twitter and have a long history of responding to questions asked on Reddit, as we did here. Sometimes we debate people over email; sometimes we write long response posts and make them public.

We can’t respond to everything, and we sometimes decline to respond to arguments we don’t understand, or conversations that don’t seem like they will be productive. This is definitely a judgment call, but it’s one we’re comfortable making. As a model, consider also this tweet from Visakan Veerasamy:

Our first experiences with Natália were of her, and her husband Matthew Barnett, being aggressive towards us for no clear reason.

Many of these early exchanges appear to have been deleted, but some of them survive. One early example was when Matthew publicly challenged us to a bet. The bet seemed like it would create a perverse incentive for us, so we declined the challenge and did our best to explain why.

Other people agreed with our interpretation. Dominik Peters said, “They’re planning to do further research about whether the theory is right or wrong, iiuc. Not sure it helps epistemically if they have a $2k incentive to find a ‘yes’ rather than a ‘no’ answer.” We tried to be as clear as possible. But Matthew didn’t seem to understand.

We responded to their comments for a while and continued to find them difficult to deal with, so we decided to stop engaging. Their comments were civil, but they were repeatedly confrontational, and our attempts to continue the conversation or explain our reasoning felt like they went nowhere.

If we couldn’t have a productive disagreement, it seemed like the most polite thing to do would be to not respond. We figured that not responding was a respectful way to decline further discussion. But they kept issuing public challenges, sending us DMs, comments, emails, for weeks. If you’ve ever stopped responding to someone and they continue sending you messages on every possible platform, you know what we mean.

So when Natália published her LessWrong posts, you can imagine why we weren’t interested in responding.

When you do science on the internet, you can see right away there are two kinds of responses. Most people want to help you get to the truth, even if they don’t necessarily agree with you. We’ve corresponded with several people like that: JP Callaghan, ExFatLoss, Jeff Nobbs, etc.

But some people want something else: it’s hard to tell what that thing is, because they seem to respond to what they imagine you said, rather than what’s actually there. It feels like they must have some motive you don’t understand — maybe they want to dunk on you, censor you, or promote you towards whatever strange goal. This isn’t a very charitable read and people who do this almost certainly don’t think of themselves this way, but that’s what it feels like on the receiving end.

And whatever, that’s the price of doing business on the internet. But you start to recognize pretty quickly whether someone is trying to help you or not, and if they’re not trying to help you, there’s really no reason to engage with them.

That’s why there’s no obligation to answer all objections. If you don’t feel like the objection was made by someone trying to get closer to the truth, and/or if you don’t feel like you’re going to get closer to the truth by answering it, why bother?

We feel like this is part of a pattern, because Natália and Matthew have acted the same way towards other researchers. They made a similar collection of arguments against the work of our one-time collaborator, Alexey Guzey. His response was “skimmed the post, tbh it seems weak”.

It’s not really that they are too aggressive. ExFatLoss is really aggressive, and we still talk to him. It’s more that discussions with Natália and Matthew never seem to get anywhere. Here’s a third party describing how Natália repeatedly edits or deletes her comments, which makes it hard to hold a conversation:

Mod note: I count six deleted comments by you on this post. Of these, two had replies (and so were edited to just say “deleted”), one was deleted quickly after posting, and three were deleted after they’d been up for awhile. This is disruptive to the conversation. It’s particularly costly when the subject of the top-level post is about conversation dynamics themselves, which the deleted comments are instances (or counterexamples) of.

You do have the right to remove your post/comments from LessWrong. However, doing so frequently, or in the middle of active conversations, is impolite. If you predict that you’re likely to wind up deleting a comment, it would be better to not post it in the first place. LessWrong has a “retract” button which crosses out text (keeping it technically-readable but making it annoying to read so that people won’t); this is the polite and epistemically-virtuous way to handle comments that you no longer stand by.

We want to be collegial, but Natália hasn’t treated us like a colleague. She often jumps straight to accusations, or just states single facts, or cites single articles as if they are a complete argument. She uses phrases like “extremely cherry-picked evidence” and accuses us of “subtle sleight of hand”. She says that our arguments are “misleading”, suggesting that any points of disagreement are both intentional and intended to mislead, without stopping to consider whether we might have simply made a mistake, or whether she might be misunderstanding our point.

Some people do use cherry-picked evidence, and we respect the desire to calls ‘em as one sees ‘em. But labeling something is a missed opportunity to describe the situation and let readers decide for themselves. And the principle of charity is also important — it’s not productive to nitpick, you should consider the best, strongest possible interpretation of an argument. Before you jump directly to accusations of cherrypicking, you should consider whether or not there are alternative explanations. Maybe you misunderstood the original argument, or made some other kind of mistake.

Maybe this is apocryphal, but we’ve heard that in medieval debate, you weren’t allowed to start criticising your opponent’s argument until you could re-state it to the point where they agreed, “yes, that’s my position.”

This is where Natália’s critiques really fail. We don’t recognize anything of our arguments in what she writes. It’s hard to respond when someone attacks a version of your argument that you didn’t make. We’re not really interested in responding to her in the future, but if she does want to offer a response, we’d like to see her at least start by re-stating what she thinks we believe. That way if she’s mistaken, it might be easier to clarify.

We believe in the principle of “focus your time and energy on what you want to see more of”. We don’t want more pointless internet arguments, more back and forths. We felt that our time was better spent elsewhere.

And this kind of disagreement does a disservice to the real issue, which is the science! We just don’t think the norms of who issued what kind of corrections when is all that interesting. We don’t want to spend our time fighting over procedure. We’d rather keep our eye on the ball, do more analysis, collect more data, and try to figure out the causes of obesity. That’s a conversation worth having.

Why Respond Now?

We didn’t respond to these arguments before, so why would we respond to them now? There are two main reasons.

First, Scott identified five points that he found interesting. When there were 101 points with no particular structure, it was hard to feel like it was possible to write a worthwhile response. No one wants to read a 101-item laundry list, and we sure as hell don’t want to write it.

But once Scott was kind enough to name his five points, we could focus on a small list of questions that a person of good judgment found concerning. That’s a discussion worth having, and tractable too.

Second, we have new data that can help resolve these disagreements. When you have the means to empirically test your disagreements, arguing is borderline unscientific. Debate is a waste of time, you should be running a study.

Instead of responding to criticisms with verbal arguments, we wanted to respond to them with data. We think this is good practice and we want to model it — we think everyone can agree that scientific debates on the internet would benefit if more people did empirical tests of their disagreements rather than forever dishing out verbal arguments and going in circles.

Now we have empirical results, so we can respond with the data. And we think it makes for a much more substantive response. Thank you for your patience. 🙂

Appendix

#1 Abrupt Shift

Do you agree with the obesity increase being gradual over the course of the 20th century, rather than “an abrupt shift” as you describe in ACH?

Much of this discussion is weird to us because, as far as we can tell, everyone is looking at the same data.

Natália wrote:

In the United States, the obesity rate among adults 20-74 years old was already 13.4% in 1960-1962 (a), 18-20 years before 1980. We don’t have nationally representative data for the obesity rate in the early 20th or late 19th centuries, but it might have been as low as ~1.5% or as high as 3%, indicating that the obesity rate in the US increased by a factor of >4x from ~1900 to ~1960.

We agree. Those numbers come from the same sources we used, like the NHANES and Helmchen & Henderson (2004). Natália quotes our sources back to us as if it contradicts what we said, which it doesn’t. It’s hard to know what to make of this kind of response.

Natália quotes us saying, “Between 1890 and 1976 … rates of obesity [went] from about 3% to about 10%.” She says, “the obesity rate in the early 20th or late 19th centuries …might have been as low as ~1.5% or as high as 3%”, and “the obesity rate among adults 20-74 years old was already 13.4% in 1960-1962.” Her numbers are also from about 3% to about 10%.

It’s hard to see how what we wrote “understates the meaningfulness and extent of the changes in average BMI and obesity rates that occurred before 1980.” Especially when Natália uses the same sources we used, and quotes the same numbers.

The important thing is that the obesity rate increased even more after 1960. See for example this graph we included in the original post:

Obesity rates went from something like 1.5%-3% around 1900 to something like 13.4% in the early 1960s. This is an increase of 11.9-10.4 percentage points over about 60 years. Then the obesity rate went from something like 13.4% in the early 1960s to something like 42.8% in 2017–2018. This is an increase of 29.4 percentage points over about 60 years. Based on these numbers, the obesity rate increased almost three times as much during 1960-2018 as it did from 1900-1960.

To us, this change looks both serious and abrupt. Per the CDC data, obesity rates for adults 20-74 years old went from 13.4% in 1960-1962 to 14.5% in 1971-1974, then to 15.0% in 1976-1980… then to 23.2% in 1988-1994, and then it keeps growing. A change of 1.6 pp from 1960-1962 to 1976-1980, a span of 20 years, followed by a change of 8.2 pp from 1976-1980 to 1988-1994, a span of just 14. You can see the slope of both obesity and extreme obesity change quite plainly on the figure. That seems like a serious change in the rate of change.

Is percentage points the wrong way of thinking about it? Natália says that “the obesity rate in the US increased by a factor of >4x from ~1900 to ~1960” when describing that change from 1.5%-3% around 1900 to 13.4% in the early 1960s. In comparison the change from 13.4% in the early 1960s to 42.8% in 2017–2018 would be about 3.2x. But intuitively, we think that a change from “for every 100 Americans you meet, about 3 are obese” to “for every 100 Americans you meet, about 10 are obese” is not as concerning as “for every 100 Americans you meet, about 10 are obese” to “for every 100 Americans you meet, about 40 are obese”.

To our mind, the strongest version of this critique is where you make the case that the rate of change in obesity rates is increasing, but not for the reasons you think. You could say, it’s true that the rate of change in obesity rates accelerated, but that might be an artifact of the distribution, while the rate of change in mean BMI was constant. And then you could make some argument about why rate of change in mean BMI is a better measure of the obesity epidemic than rate of change in obesity rates.

Having done some digging, we think this might be the argument Natália was trying to make in her original post. See in this comment thread, where Matthew Barnett, Natália’s husband, frames a version of this argument:

I think the relevant fact is that, based on the available data, it appears that average BMI increased relatively linearly and smoothly throughout the 20th century. Since BMI is approximately normally distributed (though skewed right), the seemingly sudden increase in the proportion of people obese is not surprising: it’s a simple consequence of the mathematics of normal distributions.

In other words, the smooth increase in mean BMI coupled with a normal distribution over BMI in the population at any particular point in time explains away the observation that there was an abrupt change centered around roughly 1980. It is not necessary to posit a separate, further variable that increased rapidly after 1980. The existing data most plausibly supports the simple interpretation that the environmental factors that underlie the obesity epidemic have changed relatively gradually over time, with no large breaks.

We’ve been discussing this for a long time now. It’s one of the questions we fielded in the A Chemical Hunger Discussion Thread posted on r/slatestarcodex in 2021.

The OP of the Reddit thread, u/HoldMyGin/, said: ”My biggest criticism is the assertion that obesity rates started spiking around 1980 … isn’t that what one would expect to see if you’re measuring the percent of a normal distribution above a certain threshold, and the mean of that distribution is slowly but consistently inching upward?” We responded with a series of simulations that showed that the rate of increase in obesity rates is faster than what we would expect if the mean of the distribution were slowly increasing. For more detail on discussion of these models, definitely check out this great comment thread involving DirectedEvolution.

But all that said, we have some data about BMI, so why rely purely on models? Assuming that the data in this figure we adapted from Helmchen & Henderson (2003) are roughly correct, then mean BMI increase per year was about 0.04 points per year from 1890-1894 to 1976-1980 and about 0.11 points per year afterwards.

See also u/KnotGodel’s analysis from the reddit comments, which finds:

*“You can see from the chart that (in this model) mean BMI didn’t really change until 1978. After this point it increased by ~4 points.”*

And even if it’s true that the rate of change in obesity rates is an artifact of the smooth increase in mean BMI over time, this wouldn’t change the fact that there was a relatively abrupt change in the rate of change of obesity rates around the 1970s. People might still be surprised that the rate of change in obesity rates increased so much, that it went from 13.4% in 1960-1962 to 14.5% in 1971-1974, then from 15.0% in 1976-1980 to 23.2% in 1988-1994. We know that we were.

Natália brings in another source we want to talk about, from John Komlos and Marek Brabec. This does contest the pattern, saying:

The common wisdom, based on period effects, is that obesity as a public health problem emerged suddenly in the 1980s. However, the disadvantage of cross-sectional surveys, upon which all analysis has been based, is that the subject’s current weight does not reveal when that weight was actually reached. That weight could have been reached at any time before measurement and maintained thereafter.

Essentially, if we look at someone in 1990 and he’s obese, we don’t know if he just became obese, or if he actually was obese in 1970.

We’re not sure this logic makes sense. Let’s imagine a population of 100 people. We’re looking at them in 1990 and we see that 23 of them are obese. Komlos and Brabec say, “these guys are obese now, but that weight could have been reached at any time before measurement and maintained thereafter. Therefore we can’t use this to estimate the trend.”

But we can look at the data from 1970 and see that only 15 people were obese. We can say that there were more obese people in the later snapshot than in the earlier one. Even if we can’t necessarily say whether or not obese individual #12 from 1990 was obese or not in 1970, we don’t need to. The estimate of obesity rates at two points is independent of whether or not we can track any individual across the two points.

We’re skeptical of this analysis for a few other reasons. Collecting data is already hard enough; adding in a fancy statistical model gives you more places where something can go wrong. And there’s a lot of interpolation. We don’t have BMI data from before 1959, so many parts of the model are estimates, not real data. In general we think it’s better to trust measurements over models, unless it’s very clear why the model is better.

In this case, the justification for the model doesn’t make any sense to us, so we don’t see why you would prefer it. Per the CDC, a higher percentage of people were obese in the late 80s/early 90s than in the 60s and 70s, and the increase went from 1.6 pp between the 60s to late 70s, to 8.2 pp between the late 70s and late 80s/early 90s.

But even if we accept these models, it doesn’t look like a contradiction. When you look at the figures (though remember these lines are model estimates, not data), we see:

That looks like a change in the rate of change to us. And the biggest change in rate of change seems to be for the cohort born around 1960, i.e. people turning 20 around 1980. There are some interesting implications here — that growth in obesity rates are mostly driven by the top few deciles, that the bottom decile hasn’t seen any change since cohort 1935, etc. — but it doesn’t contradict the idea of a change in the rate of change.

Natália agrees, saying, “it does look like there has been an acceleration at the later birth cohorts for the few highest BMI percentiles, but a minor acceleration is arguably not the same thing as ‘an abrupt shift.’”.

It’s hard to tell what the argument is here. Are we disagreeing about what counts as a “minor acceleration” and what counts as an “abrupt shift”? Is this just semantics? There might be an argument about what is abrupt enough to be abrupt, and it’s fine if someone disagrees, but the numbers seem pretty distinct.

The good news is that Natália agrees again. She made a changelog to the relevant post where she wrote,

The first version of this blog post argued that, contra the SMTM authors, there wasn’t an abrupt shift in obesity rates in the late 20th century. Further discussion in the comments made me realize that the argument I was trying to make was too semantic in nature and exaggerated the differences in our perspectives. I changed this about 8 hours after the post was published.

More importantly, we think this shows a misunderstanding of the role this observation plays in our work.

In Part I of the series, we introduced the idea of an abrupt shift as Mystery #2, to help drive the intuition that the obesity epidemic is more surprising than people expect, that there’s a mystery here to be solved.

We still think the change in the rate of change is surprising. If you came to our work with the expectation that obesity has been increasing at a constant rate since the invention of the croissant, you would be pretty far off the mark.

This particular mystery is interesting, but it’s orthogonal to the contamination hypothesis. Contamination can happen either gradually or abruptly, so whether or not the shift was abrupt has little bearing on whether the contamination hypothesis is plausible or correct.

There are some contaminants that are much more plausible candidates if there was an abrupt shift around 1970. If we were considering two possible causes for the obesity epidemic, one potential cause that appeared abruptly around the 1970s and another potential cause that appeared on the scene more gradually, the abruptness of the shift could help us distinguish between them.

But a slow and gradual shift is compatible with many possible contaminants, including lithium. If anything, a gradual increase starting around 1950 is more compatible with the lithium hypothesis, because there’s some reason to think that lithium exposure increased gradually:

#2 Medical Lithium Patients

Do you agree that even medical lithium patients don’t have enough weight gain to cause the obesity epidemic? If so, why do you think that getting a tiny fraction of that much lithium would?

As we understand it, the question here is this: The average American adult has gained something like 10-15 kg since the early 70s. But studies usually find that people on medical doses of lithium don’t get hyper obese, they gain only a few kilos on average. How can chronic, subclinical doses of lithium account for a gain of 10+ kg if acute, clinical doses don’t seem to cause more than 6 kg of gain?

First point here: We’re comfortable with the idea that lithium might not be the only factor causing the obesity epidemic. Natália knows this, she says, “[SMTM] also think that other contaminants could be responsible, either alone or in combination” in footnote 1 of this post.

Natália’s conclusion is, “lithium seems to cause an average of zero to 6 kg of weight gain in the long term. And strikingly, the upper end of that range, although large, is only half the amount of weight the average American adult has gained since the early 70s.”

To us, this doesn’t do anything to diminish the importance of this hypothesis. If lithium caused “only” 50% of the weight gain since 1970, or even just 10%, that would still be a pretty big deal. We should try to reverse it, so that everyone can be 6 kg lighter.

That said, let’s make the case that lithium might be responsible for more than 50%.

Modern people do tend to gain less than 15 kilos on clinical doses of lithium. But if we are already exposed to lithium in our food and water, we would expect that additional lithium would only top up the existing effect. If everyone’s on lithium already, then adding a bit more wouldn’t have the same impact as starting from zero, and will underestimate the total effect.

Think about the dose-response curve. For the sake of illustration, let’s imagine it’s like this, where the x-axis is dose of lithium per day, and the y-axis is extra weight gained from lithium exposure:

In the ancestral environment, everyone got less than 0.1 mg of lithium per day, and they had no extra weight from lithium. If you suddenly put one of these people on a clinical dose of 100 mg/day, they would gain 40 lbs.

Now let’s imagine that in the modern environment, everyone is getting 10 mg/day from their food and water. This would mean that everyone has already gained 20 lbs from chronic exposure. If we then put everyone on a clinical dose of 100 mg/day, they would gain only 20 lbs.

A person in this world might look at this and conclude that lithium doesn’t cause enough weight gain to cause the obesity epidemic. After all, adding a huge medical dose only makes you gain half of the observed effect. But in fact, lithium is causing the entire 40 lbs. It’s just that the background dose of 10 mg/day caused the first 20 lbs, and the 100 mg/day clinical dose is only topping up the remainder of the dose-response curve.

In fact, it’s kind of impressive that a clinical dose of lithium can cause like 6 kg more weight gain in an already obese population. If you gave the same dose to a hunter gatherer from 50,000 BC, he’d probably gain more.

In reality, everyone’s curve will be slightly different, the maximum effect will be slightly different, and so on. We discuss this at length in the introduction to our study, Subclinical Doses of Lithium Have Plenty of Effects. But the general logic still holds. If subclinical amounts of lithium are already causing weight gain, then adding more lithium on top will underestimate the total effect.

Scott also asks, “why do you think that getting a tiny fraction of that much lithium would [lead to weight gain?]”

One strong reason to suspect that trace or subclinical doses might lead to weight gain is the example of the Pima of the Gila River Valley in Arizona, who we’ve written about here and here.

The Pima were exposed to unusually high levels of lithium as the result of improperly sealed petroleum exploration boreholes that discharged salt brines to the surface. According to Sievers & Cannon (1973), the lithium levels in the Pima’s drinking water was 100 ng/mL, back when the average lithium concentration in American municipal water was about 2 ng/mL. Note that 100 ng/ml is a trace dose, but it’s 50x the level most Americans were getting in their water at the time, and it’s still a relatively high level for drinking water today.

Sievers & Cannon also found that lithium concentrated in some of the Pima’s crops. In particular, wolfberries were found to contain an “extraordinary” concentration of 1,120 ppm lithium by dry weight. We did some back-of-the-envelope math and estimated that the Pima might have been getting around 15 mg of lithium per day from wolfberry jelly. This is also a subclinical dose, but it’s still in the milligram range, even if our estimate is off by an order of magnitude.

The other notable thing about the Pima is that they were unusually obese, and had “the highest prevalence of diabetes ever recorded”, back before the general obesity rate had even broken 10%. We haven’t been able to find exact measurements of body weight, BMI, or obesity rate for the Pima in the 1970s, but all sources agree that they were unusually obese.

So, the Pima were exposed to chronic trace doses of lithium in their water and chronic subclinical doses in at least one of their common foods. The Pima were also unusually obese and had exceptionally high rates of diabetes. This doesn’t prove that the lithium exposure caused the obesity and diabetes, but it’s certainly consistent with that hypothesis, and it’s one reason to think that getting a tiny fraction of a clinical dose of lithium would lead to weight gain, especially with chronic exposure through food and water.

If lithium exposure was the cause, then that’s evidence that even trace amounts, when chronic, can cause more than 6 kg of weight gain, which supports the idea that lithium alone could explain more than 50% of the obesity epidemic.

You may suspect that this is us giving unfair weight to a piece of evidence that happens to closely fit a preferred hypothesis. Two reasons why you shouldn’t think that’s the case:

First of all, the Pima were brought to our attention as a counter-example, meant to challenge the lithium hypothesis. We were totally unaware of the Pima when we developed the lithium hypothesis, but during a discussion of these theories on the SSC subreddit, u/evocomp wrote,

The famous Pima Indians of Arizona had a tenfold increase in diabetes from 1937 to the 1950s, and then became the most obese population of the world at that time, long before 1980s. … What’s the chance that all these populations who lived under calorically-insecure evolutionary pressures are all independently highly sensitive and equally exposed to Lithium, PFAS, or whatever contaminants are in SPAM or white bread?

So the example was chosen to be adversarial, and u/evocomp was right to challenge us in this way. But when we looked into it, we not only found that the Pima were equally exposed to lithium, but that they were enormously exposed to lithium.

The rationalist citations here are Making Beliefs Pay Rent (in Anticipated Experiences) and Fake Causality. The core idea is that a good test of a theory is whether it makes accurate predictions about new, not-yet-seen data, not whether it can be made to fit old data retroactively. You develop a theory by fitting it to past data, which constrains the possibilities, but you can’t test it that way. You evaluate a theory by how accurately it predicts new, unseen evidence. This was an adversarial test with unseen evidence, and the lithium hypothesis scored almost perfectly on prediction. It’s a major reason we started preferring the lithium hypothesis over other contaminants!

Here’s a project we would love to see from a third party (Scott qualifies): Try to find other populations that were notably obese before the 1970s. We predict that if any such populations can be found, many of them will be found to have been exposed to high levels of lithium, or will have been found to be exposed to factors associated with high levels of lithium, like drawing drinking water from deep wells, early fossil fuel prospecting, other mining, seismic or volcanic activity, other water quality issues, etc. We say “many” rather than all because we don’t think that lithium is the only thing that can cause obesity. It would still be consistent with the lithium hypothesis if there were some early populations that were made obese by something else.

Second, back in the 1970s, Sievers & Cannon wrote:

It is tempting to postulate that the lithium intake of Pimas may relate 1) to apparent tranquility and rarity of duodenal ulcer and 2) to relative physical inactivity and high rates of obesity and diabetes mellitus.

Sievers & Cannon also suspected that lithium exposure might be responsible for the high rates of obesity and diabetes in the Pima. They couldn’t possibly have been said with the goal of explaining the obesity epidemic, because the obesity epidemic didn’t exist in the early 1970s when the quote was written. Sievers & Cannon had no idea it was coming.

Whatever factors you think might have misled us into thinking that lithium causes high rates of obesity and diabetes, they couldn’t have misled Sievers & Cannon. They came to the same conclusion independently, about fifty years before we did.

Finally, we think chronic exposure to low doses of lithium may build up over time, to the point where chronic trace exposure can eventually lead to clinical levels in your brain. It might take 10 or 20 years for trace levels in your water to lead to clinical levels in your brain, but we all spend 10 or 20 years consuming trace amounts in our water, so that’s no problem.

In our discussion with JP Callaghan, at the time an MD/PhD student with expertise in protein statistical mechanics and kinetic modeling, he put together a three-compartment model (gut -> serum <-> tissue) and found that, for plausible values of the parameters, “lognormally distributed doses of lithium with sufficient variability should create transient excursions of serum lithium into the therapeutic range” and “in that third compartment [brain], you get nearly therapeutic levels of lithium in the third compartment for whole weeks (days ~35-40) after these spikes, especially if you get two spikes back to back.”

There are limitations here, but they cut both ways. On the one hand, the parameters of both the system and the lognormal doses are plausible, but made up. On the other hand, it’s not clear if therapeutic ranges in the brain are needed to cause weight gain. Weight gain could start at brain levels well below the therapeutic.

The model is more of a sanity check, and it does support the idea that chronic exposure to trace or subclinical levels of lithium over a long enough time could lead to relatively high concentrations in the brain, thyroid, and/or bone. In addition, chronic effects may be different from acute effects. Take a look at our discussion with JP Callaghan to learn more.

#3 Trace Lithium and Trace Effects

Natália lists several reasons to expect that trace lithium doses should have only trace effects – Gwern’s reanalysis showing few-to-no psych effects, some studies suggesting low doses have fewer side effects, and lack of any of the non-weight-gain side effects of lithium in trace users. What are your thoughts on this?

Let’s start at the top. Natália writes, “Gwern has looked into this (a) and concluded that the evidence that such low doses of lithium cause psychiatric effects is actually fairly weak.”

This is a pretty rough gloss of what Gwern actually said. Gwern does say that the evidence is weak, but he doesn’t claim it’s nonexistent. Overall he takes the hypothesis seriously. His topline summary says:

Epidemiological research has correlated chronic lithium consumption through drinking water with a number of population-level variables … However, the evidence is weak.

But in the body of his article, he writes, “The criticisms of the trace lithium correlation seem weak to me”. So Gwern’s position is mixed: he thinks the evidence and the criticisms are both weak. He thinks we need to run more experiments, and we agree.

There is at least one existing RCT of trace-level effects. This is Schrauzer & de Vroey (1994). In this study, the researchers gave a group of former drug users (heroin, crystal meth, PCP, and cocaine), either 0.4 mg per day (a tiny trace dose) of lithium orally, or a placebo. Even on such a tiny dose, everyone in the lithium group reported feeling happier, more friendly, more kind, less grouchy, etc., “without exception”.

Gwern doesn’t mention this paper in his review (though he does cite other Schrauzer papers), so we assume he hasn’t encountered it. It’s a small study, just 24 subjects, but it’s a start in the direction he recommends, it provides a little experimental support for the correlational findings.

Gwern’s overall position seems to be one of cautious skepticism. On the one hand, there are lots of suggestive correlations. On the other, psychiatric doses are much higher than groundwater doses. He says, “one of the main problems with inferring that lithium causes these reductions [in various symptoms] is that it seems difficult to reconcile with how large the doses must be to treat mental illness”.

Gwern considers some ways to resolve this dilemma, and we want to focus on a few of them in particular. One option he considers is that:

…groundwater doses [may be] more effective than one would expect comparing to psychiatric doses of lithium carbonate (perhaps due to chronic lifelong exposure…)

This is one of the options we discussed with JP Callaghan. It seems plausible that with chronic lifelong exposure, lithium accumulates in the brain or thyroid, or possibly in the bones. If it does, that could lead to a reservoir. Gwern makes a similar point in the next paragraph, saying:

Ken Gillman … criticizes the correlations as generally invalid due to the smallness of the drinking water dose compared to the dietary doses of lithium; I disagree inasmuch as lithium doses are cumulative, Schrauzer 2002 reports an FDA estimate of daily American lithium consumption 1mg, points out that natural levels can reach as high as 0.34mg via drinking water

Gwern also considers this response:

…lithium may have multiple mechanisms one of which kicks in at psychiatric dose levels and the other at groundwater levels (somewhat supported by some psychiatric observations that depressives seem to benefit from lower doses but in different ways; negate #1 in a different way)

We agree this is plausible, and we found evidence for this argument in our study, Subclinical Doses of Lithium Have Plenty of Effects. We polled people on Reddit who took lithium as a nootropic, and asked them to tell us what lithium compound they took, how much they took per day, approximately how many days they tried the dose for, and what effects they experienced on each dose.

People reported many different effects of lithium at subclinical doses (ballpark 1-10 mg/day). Even in our limited dataset, our collaborator Troof found evidence for different effects kicking in at different doses, and sent us this figure:

Both of Gwern’s interpretations are supported by the example of the Pima.

Following chronic lifelong exposure to relatively high but still trace groundwater doses, the Pima ended up with very high rates of obesity and diabetes, despite getting what were small daily amounts compared to psychiatric doses of lithium carbonate.

Their example also supports the idea that lithium has some effects that kick in at psychiatric dose levels and others at groundwater levels. The Pima became obese and lethargic, but didn’t (as far as we know) suffer from hand tremors or nausea. We shouldn’t be at all surprised if a drug has some effects that kick in at one dose and other effects that kick in at other doses. See our arguments here for more detail.

Does that prove that the lithium in their food and water caused the high rates of obesity and diabetes? No, but it’s consistent with the hypothesis, and evidence in favor.

These examples also seem to address the concern of “some studies suggesting low doses have fewer side effects, and lack of any of the non-weight-gain side effects of lithium in trace users”.

The Pima were exposed to chronic trace amounts of lithium. They did have high rates of obesity and a few other possible symptoms. But they didn’t (as far as we know) experience other side effects like hand tremors, ringing in the ears, or “eyeballs bulge out of the eye sockets”. This doesn’t clarify whether or not the obesity was caused by the lithium, but it does clarify that chronic low doses of lithium don’t cause these non-weight-gain side effects.

And in our study, Subclinical Doses of Lithium Have Plenty of Effects, redditors who took subclinical doses of lithium did commonly report some non-weight-gain side effects, like increased calm, brain fog, frequent urination, and decreased libido, but rarely or never reported other side effects, like eye pain, fainting, or severe trembling.

In fact, the only three participants who reported tremors were all on clinical doses — 300 mg/day lithium carbonate, 600 mg/day lithium carbonate, and 50 mg/day listed as lithium orotate (we think this means 50 mg/day elemental). This suggests that tremors don’t kick in at subclinical doses. So from this example too, we see evidence that low doses of lithium cause some non-weight-gain side effects, but don’t cause many others.

We also think it’s possible (though not necessarily likely) that some non-weight-gain side effects of lithium exposure are widespread, and the change was just slow enough that people mostly didn’t notice. Consider:

A final thing to note here is that the EPA says they are concerned about lithium exposure, even at the trace levels found in drinking water. They write:

Although useful for treating mental health disorders, pharmaceutical use of lithium at all therapeutic dosages can cause adverse health effects—primarily impaired thyroid and kidney function. Presently lithium is not regulated in drinking water in the U.S. The USGS, in collaboration with the EPA, calculated a nonregulatory Health-Based Screening Level (HBSL) for drinking water of 10 micrograms per liter (µg/L) or parts per billion to provide context for evaluating lithium concentrations in groundwater. A second “drinking-water-only” lithium benchmark of 60 µg/L can be used when it is assumed that the only source of lithium exposure is from drinking water (other sources of lithium include eggs, dairy products, and beverages such as soft drinks and beer); this higher benchmark was exceeded in 9% of samples from public-supply wells and in 6% of samples from domestic-supply wells.

This strikes us as strange — 10 µg/L and 60 µg/L are higher than historical levels, but those are pretty trace amounts, even by our standards. In comparison, the Pima were exposed to about 100 µg/L. We don’t know why the USGS and EPA are concerned about these levels, or where those thresholds come from, but it’s notable that they are concerned.

If anyone can find out where they got these numbers, please let us know. The USGS people haven’t responded to our emails.

#4 Wild Animals and Obesity

Do you agree that wild animals are not really becoming obese?

This is a misunderstanding about the use of the word “wild”.

Our main source for animals becoming obese was Klimentidis et al. (2010), Canaries in the coal mine: a cross-species analysis of the plurality of obesity epidemics. This is a study of weight change over 20,000 animals from 24 distinct populations and eight species, and the top-line finding was that “In all populations, the estimated coefficient for the trend of body weight over time was positive (i.e. increasing).”

This paper uses the terms “wild” and “feral” to refer to a sample of several thousand Norway rats. Following this source, in Part I of A Chemical Hunger we also use the terms “wild” and “feral” to refer to these rats. We say, “Humans aren’t the only ones who are growing more obese — lab animals and even wild animals are becoming more obese as well. Primates and rodents living in research colonies, feral rodents living in our cities, and domestic pets like dogs and cats are all steadily getting fatter and fatter.”

This word seems to have caused a lot of confusion. Many people got the impression that we were claiming that rhinos on the Serengeti were becoming more obese. What we meant was that the obesity epidemic isn’t limited to humans. That’s consistent with the examples we used. We summarized this paper as: “Primates and rodents living in research colonies, feral rodents living in our cities, and domestic pets like dogs and cats are all steadily getting fatter and fatter,” and that’s exactly what the study says. Natália appeals to a dictionary definition to claim that we’ve said something wrong here, but the paper we cited literally refers to these rats as “wild”!

We talked about this study the same way every time we brought it up, in our posts or in conversations on Twitter. Natália selectively quotes one part of one of this sentence to make it look like we’re misrepresenting the results, but she leaves out the fact that we always included the context. We wrote:

We’ve previously reviewed the evidence that pets, lab animals, and even wild animals have gotten more obese over the past several decades.

Natália cuts off the first part and only says: “even wild animals have gotten more obese over the past several decades”, distorting the focus. We are not sure what more we could have done to make our meaning clear.

But the broader question is definitely interesting, so let’s consider it now: have “truly wild” animals, living totally separately from humans, been getting obese as well?

We think this is a point where reasonable people can disagree. There isn’t much data about the weight of truly wild animals over time, let alone good data that can distinguish how fat they are independent of other possible changes in their weight (e.g. they’re getting larger but not fatter).

When there’s not much data, you look for the data there is and see what it can tell you. In this case we don’t expect the data will be well-controlled or that it will do a good job accounting for alternative explanations. We just want to look and see if truly wild animals are heavier now than they were in the past.

In our conversation with Divia Eden, we discussed Wolverton, Nagaoka, Densmore, & Fullerton (2008). We pulled out this figure, which shows a positive trend for does and a stronger positive trend for bucks:

And we clarify:

There are alternative explanations for these trends of course — less competition for food, etc. — but at the very least these do seem to be animals eating pretty wild diets, and they do seem to be gaining weight

Basically, we find data that are consistent with the idea that truly wild animals are getting heavier. And we point out that there are alternative explanations.

So it’s pretty strange that Natália’s response is to point out there are alternative explanations. For example, she says:

Predation decreases their population density, which increases the amount of energy available for each individual deer in their habitat.

That’s the same alternative explanation we considered in the tweet: “less competition for food”. We know she must have read this tweet because she cites the thread in her post. We don’t know why she doesn’t mention that we highlighted the same alternative explanation. She’s framing it as though we thought this study was a slam-dunk, when we only ever said it was suggestive.

Better studies that control for confounds would be ideal. But there are always alternative explanations. In the absence of controlled studies, we use the best available data and evaluate how consistent it is with the hypothesis.

Certainly if we had looked for the weights of white-tailed deer and found that they were flat since 1970, or that their weights were decreasing, that would have been some evidence against the idea that truly wild animals are becoming obese, or at least inconsistent. So finding that weights are steadily increasing is some evidence in favor of the idea that truly wild animals are becoming obese, or at least it’s consistent with the idea.

Overall, this feels like an isolated demand for rigor, an “[attempt] to demand that an opposing argument be held to such strict invented-on-the-spot standards that nothing (including common-sense statements everyone agrees with) could possibly clear the bar”. To use Scott’s framing, “evidence consistent with a hypothesis doesn’t count if there are alternative explanations for that evidence” is a fake rule we never apply to anything else.

#5 Lithium at Altitude

Do you agree that water has higher lithium levels at high altitudes (the opposite of what would be needed for lithium to explain the altitude-obesity correlation)?

We believe Scott is referring to this argument from Natália:

Using publicly-available data from the USGS and the Open Elevation API, I found that across 1,027 domestic-supply wells (all wells whose coordinates were available), the correlation between altitude and log(lithium concentration) is 0.46. I also checked the correlation between altitude and topsoil log(lithium concentration) in the United States, with data I found here, and, again, it was positive (0.3). So lithium exposure is probably higher, rather than lower, in high-altitude areas in the United States (which, as a reminder, have lower obesity rates).

This criticism was pretty surprising to us, because we literally discussed it in the original series! In Interlude H (“Well Well Well”) we explored the same USGS dataset in depth and said:

One thing that you’ll notice is that the distribution of lithium in well water doesn’t match up all that well with the distribution of obesity. Colorado is the leanest state but has pretty high levels of lithium in its well water. Alabama is quite obese but levels of lithium in the well water there are relatively low. What gives?

…all of these measurements are of well water, but many areas get their drinking water from surface sources rather than from wells.

Let’s start with Colorado, since it’s the clearest example. As you can see from the map above, the average level of lithium in Colorado well water is higher than the national average. We have the raw data, so again we can tell you that the median level in Colorado wells is 17.8 ng/mL, the mean is 28.0 ng/mL, and the max is a rather high 217.0 ng/mL.

But this doesn’t matter, because almost none of the drinking water in Colorado comes from wells. Instead, most of the drinking water in Colorado comes from surface water, and most of that water comes directly from pure snowmelt.

We go on like this for a while.

Natália’s analysis only covers domestic-supply wells. These wells provide only part of our drinking water. It appears to exclude public-supply wells, and it entirely excludes surface water sources.

This is a problem, because we would expect the altitude-obesity correlation to mostly come from surface water contamination. Water from drilled wells has often been down there for thousands or hundreds of thousands of years, so lithium concentration in these aquifers is largely independent of human activity. But runoff from roads, landfills, brine spills, power plants, and factory explosions goes directly into surface water, and from there directly into people’s mouths. When we looked at the most obese communities in America, we found that many of them got their drinking water from surface water sources, often sources that have been exposed to lithium contamination from fossil fuels or from explosions at the local lithium grease plant.

It’s also worth restating that our position is that altitude is a proxy for “height in watershed”, which is itself a proxy for overall contamination. For example, West Virginia is relatively high elevation but also quite obese. In fact, it’s currently the most obese state of them all, at 41.2% obese. Why bother computing these correlations, doesn’t West Virginia disprove the theory all on its own?

Not at all, because despite being high-altitude and high in its watershed, West Virginia is home to an enormous amount of environmental contamination — especially from fossil fuels, which are a leading cause of lithium contamination. When you look at the local WV coal power plants, you find that they are leaking lithium into the surrounding water supply, sometimes at levels of above 100 ng/mL.

Even without these issues, this correlation can’t be a meaningful measure of the lithium-altitude question because the data aren’t at all representative. To extend correlation results to a population, the data should be a random (or otherwise representative) sample from that population. These data are not representative geographically or by population density. Here’s a map of the domestic-supply wells from this dataset (which Natália must have seen, because she has the same map in her post):

As you can see, the data mostly comes from Nebraska, certain parts of Texas, and the East Coast. Some parts of the country are barely represented; and some states, like Tennessee, are not represented at all.

So even if there is a small correlation within this dataset, it’s not an estimate of the correlation between lithium and altitude in the country as a whole, not even just within domestic-supply wells. Without a representative sample, we can’t reasonably infer that the same relationship in general would hold across the U.S.

#6 Lithium in American Food

Scott didn’t mention this one, but it’s the point that sparked Natália’s criticisms in the first place, so we think it deserves special attention.

This whole story begins when we put out a literature review of lithium levels in food. We concluded that, “There’s certainly lithium in our food, sometimes quite a bit of lithium. It seems like most people get at least 1 mg a day from their food, and on many days, there’s a good chance you’ll get more.”

The opening argument of Natália’s original post disputes this conclusion. Her argument is largely based on evidence from Total Diet Studies (TDS), which find less than 0.5 mg/kg lithium in every single food.

Natália prefers the TDS numbers, which is fine. But she says that our “literature review pretty much only includes studies that are outliers in the literature”. And she says that our review “largely relies on old data from a single author from Germany”.

This is not true. We cite more than 20 papers in that literature review, some of which are review papers that include other papers we didn’t cite directly. Only two of the papers we cite include this German author, Manfred Anke, as one of the authors — Anke, Schäfer, & Arnhold (2003) and Anke, Arnhold, Schäfer, & Müller (2005). We also mention two papers from Anke from 1991 and 1995, but we weren’t able to find them at the time, so they aren’t among the papers we cite and we weren’t able to include their data in the review. Are sources from 2005, 2003, 1995, and 1991 “old data”? They’re certainly not as old as many of the other sources we cited, like this 1941 Nature publication or this 1929 Science publication, which Natália didn’t complain about.

Maybe this is more of a concern about the number of times we mention Anke, rather than the proportion of papers he contributed. We do quote Anke a lot, but this is because he reports a lot of measurements in those two papers. Anke reported measurements for almost every food group, and we wanted to pass those measurements on to the reader. Omitting these measurements from our review would be a serious oversight.

We’d prefer to have more sources, but for some foods we could only find one or two sources besides Anke. We even complain in the post about having to rely so much on his data, saying “the bad news is that, like pretty much everything else, levels in animal products are poorly-documented and we have to rely heavily on Manfred Anke again.” This is why we conclude by calling for more research.

The truth is that there’s a split in the literature. The TDS studies consistently find low levels of lithium in food and beverages, as do some other papers. But other sources find much higher levels (not an exhaustive list):

Bertrand (1943), “found that the green parts of lettuce contained 7.9 [mg/kg] of lithium”
Borovik-Romanova (1965) “reported the Li concentration in many plants from the Soviet Union to range from 0.15 to 5 [mg/kg] in dry material”, in particular listing the levels (mg/kg) in tomato, 0.4; rye, 0.17; oats, 0.55; wheat, 0.85; and rice, 9.8.
Hullin, Kapel, and Drinkall (1969) found more than 1 mg/kg in salt and lettuce, and up to 148 mg/kg in tobacco ash.
Duke (1970) found more than 1 mg/kg in some foods in the Chocó rain forest, in particular 3 mg/kg in breadfruit and 1.5 mg/kg in cacao.
Sievers & Cannon (1973) found up to 1,120 mg/kg lithium in wolfberries.
Magalhães et al. (1990) found up to 6.6 mg/kg in watercress at the local market.
Ammari et al. (2011), looked at lithium in plant leaves, including spinach, lettuce, etc. and found concentrations in leaves up to 4.6 mg/kg Fresh Weight.
Manfred Anke and his collaborators found more than 1 mg/kg in a wide variety of foods, in multiple studies across multiple years, up to 7.3 mg/kg on average for eggs.
Schnauzer (2002) reviewed a number of other sources finding average intakes across several locations from 0.348 to 1.560 mg a day.
Five Polish sources from 1995 that a reader sent us reported finding (as examples) 6.2 mg/kg in chard, 18 mg/kg in dandelions, up to 470.8 mg/kg in pasture plants in the Low Beskids in Poland, up to 25.6 mg/kg in dairy cow skeletal muscle, and more than 40 mg/kg in cabbage under certain conditions.

Some of these measurements are of dry weight, so the fresh food would presumably have less. But others are fresh weight and still find > 1 mg/kg.

Hydroponic / plant-uptake studies, like Magalhães et al. (1990), Hawrylak-Nowak, Kalinowska, and Szymańska (2012), Kalinowska, Hawrylak-Nowak, and Szymańska (2013), Antonkiewicz et al. (2017), and Robinson et al. (2018), find that plants grown in lithium-rich water or soil accumulate lithium, and often end up containing more than 1 mg/kg. The lithium concentrations in these studies are mostly much higher than the amounts we think crops are usually exposed to, but they clearly support the idea that crops can accumulate lithium from their environment.

So, some sources find less than 1 mg/kg of lithium in food and beverages, others find more. The thing to do is to look at the totality of the evidence and try to figure out what’s going on. When results differ, it’s an opportunity to come up with hypotheses and do some testing to determine why.

We went back and took a closer look at the study methods. What we noticed is that the studies that found < 1 mg/kg lithium tended to use the same analysis technique — inductively coupled plasma mass spectrometry (ICP-MS) with microwave digestion with nitric acid (HNO3). The studies that found more than 1 mg/kg lithium in food mostly used a variety of other techniques. This made us suspect that the split in the literature is caused by the fact that different analytical methods give very different results, with some methods giving much higher and other methods giving much lower estimates.

To test this, we ran a study where we compared a couple different analytic approaches on a short list of diverse American foods. This confirmed our hypothesis. When the foods were digested in HNO3, both ICP-MS and ICP-OES analysis mostly reported that concentrations of lithium were below the limit of detection. And when foods were dry ashed first, both ICP-MS and ICP-OES consistently found levels of lithium above the limit of detection, reporting concentrations of several mg/kg for many of the foods we tested:

We think the higher numbers are more accurate — our full reasoning can be found in the original post. But even if you take the more conservative numbers as real, they still support the idea that foods sometimes contain more than 1 mg/kg, as these methods found up to 1.2 mg/kg lithium in goji berries.

Eggs had the highest levels of lithium in these results, up to 15.8 mg/kg lithium when ashed and analyzed with ICP-OES. So we followed up this project by running another pair of analytical studies taking a closer look at lithium levels just in American eggs.

The main finding of Study 1 is that that lithium was detectable in nearly all eggs:

Study 2 looked at egg-to-egg variation, finding less variation in samples from 1-egg batches than 4-egg batches, and generally confirming the results of Study 1:

A few general points here.

Don’t verbally disagree, empirically disagree. We could go back and forth for months, arguing about who is cherrypicking whom, which set of studies are really the “outliers”, whether SMTM relied too much on data from a single author from Germany, or whether or not four papers from 1991, 1995, 2003, and 2005 count as “old data”.

Why not run new studies to try to get to the bottom of things instead? Natália correctly pointed out that there was no lithium data from food from the modern United States. That was a big gap in our understanding, so we tested foods from the modern United States. Now those data exist.

Internet scientists can do more than comb over other people’s work and fight about it. It’s much better to settle confusion with data than with words, much more productive to fight over study design than over definitions. Let’s do that instead.

Analytical chemistry is not easy! People seem to assume that you can put a food sample into a machine and get an objective measurement of how much lithium is in that food out the other side. We know this because we kind of assumed the same thing before we did this project. Chemistry is one of those sciences that we have pretty well solved, right?

Turns out, it’s much more complicated. Different analytical techniques give different answers. And those answers aren’t objective, they’re just estimates. You realize that none of the measurements in the literature are any more objective than yours. They all require interpretation, and any of them could be wrong.

At some point we thought that the difference in findings was the result of different analytical techniques, so we were only going to compare ICP-MS to ICP-OES, with identical digestion. We happened to throw in different digestion techniques just in case. And it’s a good thing that we did, because that ended up being the main finding. It would have been easy to miss.

These two analytical techniques disagree, and it’s possible that one or both are overestimating lithium concentrations. But it’s also possible that they’re both underestimating lithium concentrations. We found up to 15 mg/kg lithium in eggs, but if the techniques are systematically underestimating the true concentrations, then maybe eggs contain more. Maybe they contain a lot more.

In fact, we think it’s more likely that these techniques underestimate lithium than overestimate. Lithium is especially tricky to measure because it is a tiny and extremely light ion that reacts differently depending on what else is in the sample. These kinds of problems tend to make tests read too low, not too high. Sources often emphasize how easy it is to run into these problems, like this article by environmental testing firm WETLAB which describes several potential problems in lithium analysis: “some of the limitations for lithium analysis are that lithium is very light and can be excluded by heavier atoms. … When Li is in a matrix with a large number of heavier elements, it tends to be pushed around and selectively excluded due to its low mass. This provides challenges when using Mass Spectrometry.”

So if our tests found 15 milligrams per kilogram in eggs, the real number could be even higher. And if that’s true, then we may still be underestimating how much lithium is actually in the food we eat.

This isn’t the end of the story, of course. We only tested a small number of foods, and we didn’t test many samples of each. We think this confirms that Americans regularly consume foods containing more than 1 mg/kg of lithium, but it doesn’t give a great sense of which foods contain the most lithium, or how much lithium might be contained at the upper limits. We found eggs that contain 15 mg/kg after looking at only a small number of eggs, so there are probably eggs out there that contain more, maybe a lot more. We haven’t tested wheat or soy, so if those contain 10 or 50 or 100 mg/kg, we wouldn’t know.

We’re currently fundraising to continue these studies, test more foods, and compare more analytical techniques so we can determine which technique(s) gives the most accurate measurements. We think it would be good to know how much lithium is in the American food supply, which foods have the highest concentrations, and how to measure these things in general.

If you’ve read to this point of the post you must be genuinely interested in this work, so please contact us. If you’d prefer the analyses to come from a third party, we would also love to see independent teams investigate these same questions and we’re ready to help.

Some Thoughts

Something about this whole discussion still strikes us as very odd.

Maybe it has something to do with how we think about science. Either the lithium hypothesis is already true, or it is already false. Arguments can change minds, and can shape how people decide to spend their time and energy, but the hypothesis is already true or false. If it is true, then all observations in the future will bend towards it. Otherwise they won’t. Argument can’t change that.

Any given hypothesis, we can take it or leave it. The real goal is to cure obesity, or at least figure out where the obesity epidemic came from. We give the lithium hypothesis a lot of weight because we still find it to be well-supported by the evidence — it’s not perfect, but it has predicted things that no other theory would predict (like that the Pima would have high levels of lithium in their water in the early 1970s) and it accounts for evidence that other hypotheses have a hard time accounting for (like why auto mechanics have such high rates of obesity).

We’re not on the “side” of the lithium hypothesis, but we’re happy to make the case for it as long as we think that it’s a plausible hypothesis. And as long as we think it’s the most likely hypothesis, we’ll keep looking for evidence that will help us clarify, like the studies of lithium in American food that we mentioned above.

If the lithium hypothesis is not true, or only accounts for a minor fraction of the obesity epidemic, we want to find out as soon as possible, so we can investigate other theories instead. For what it’s worth, we do think there’s some chance that the obesity epidemic is caused by pesticides, or something related to cars and heavy machinery, maybe in the exhaust.

We don’t understand why people think we are partisan in favor of the lithium hypothesis, but it’s a real stumbling block for these conversations. Good relationships are fundamentally based on the assumption of good faith, which means giving the other person the benefit of the doubt and believing they have positive intentions, even when their actions are unclear or confusing.

It is hard for us to know how to respond to people who start with the assumption that we are partisan and have bad intentions, for the same reason it is hard to productively respond to that schoolyard taunt, “does your mom know that you’re gay” — it is strongly and negatively framed, and any response plays into that framing. When people come at us asking us to defend a position rather than discuss it as colleagues, it’s a missed opportunity for everyone to work together.

We’d like to ask you to treat us like people rather than like opponents. There is a real mystery to be solved here, and our best bet at solving it is everyone working together and extending each other as much curiosity and charity as possible.

We can and should have fierce disagreements over the facts, but as long as our shared goal is finding the truth, we can have these disagreements in collaboration and good humor.

Philosophical Transactions: Potato Serendipity (and FODMAP testing)

June 18, 2025 slimemoldtimemoldcase study, diet, food, health, internet science, nutrition, obesity, Philosophical Transactions, potato, science, weight-loss3 Comments

In the beginning, scientific articles were just letters. Eventually Henry Oldenburg started pulling some of these letters together and printing them as the Philosophical Transactions of the Royal Society, the first scientific journal. In continuance of this hallowed tradition, here at SLIME MOLD TIME MOLD we occasionally publish our own correspondence as a new generation of philosophical transactions.

Today’s correspondence is from a husband and wife who wish to remain anonymous. This account has been lightly edited for clarity, but what appears below is otherwise the original report as we received it.

The potato diet has mostly been used for weight loss, but it’s also notable for involving mostly one food and being close to nutritionally complete, which means you can use it as an elimination diet to study things like food triggers. We’ve been interested in this idea for a long time, and we find this case study particularly compelling because it’s a rare example of someone doing just that!

Since around 2018, K had been suffering from stomach pain, bloating, gas, and chronic constipation. Chronic constipation worsened after two pregnancies, so K sought medical intervention again in Feb 2025. K was prescribed medication (Linzess) to treat the constipation, which initially improved symptoms but was unreliable and had unpleasant side effects. She had been on that medication for 1 month before starting the potato diet.

Family and friends were bewildered to hear our plan, warning us of muscle loss and blood sugar problems since potatoes are ‘bad’.

Her initial goal was to lose 5-10 pounds from a starting BMI of 23.4 and test out the claims we read online about the diet. K actually joked, “wouldn’t it be funny if this diet fixes my stomach problems?”

We started the diet on 21MAR2025. The first two and a half days were 100% potato for both of us. Morale was suffering by the afternoon of day 3, so we caved and had a potato-heavy dinner with our kids. Afterwards, we agreed to eat only potatoes until dinner so we could still have a normal family meal time. We did make sure potatoes featured heavily in the weekly meal plan.

Within a week, K noticed improved symptoms and regularity without any medication. Initially, she thought she might have a lactose intolerance, so she switched to lactose-free milk and quit the potato diet once we reached the end of our planned testing window.

Back on a regular diet (but still avoiding lactose), K’s symptoms came back worse, with constant stomach aches and bloating. K realized that she had unintentionally been on a low-FODMAP diet while on the potato diet and decided to do intolerance testing.

Her methodology for intolerance testing follows:

Ate a high-potato, low FODMAP diet until minimal symptoms were present.
Used NHS FODMAP rechallenging protocol to isolate FODMAP groups (lactose, fructans from wheat, fructans from onions, fructans from garlic, fructans from fruit, fructose, galactooligosaccharides, sorbitol, mannitol, fructose + sorbitol) and identify foods to use for testing each group
Spent 3 days of rechallenging per group: day 1 – small portion, day 2 – med portion, day 3 – large portion of challenge food (ex: 1/4 cup milk, 1/2 cup milk, 1 cup milk)
Kept daily log of symptoms and severity
Allowed 3 days of ‘washout’ after rechallenging
Rechallenged next food group, but did not incorporate challenged foods into diet to avoid multiple FODMAP effects
If symptoms appeared after a food challenge, waited till symptoms subsided and repeated the rechallenge over another 3 days

Incorporating lots of potatoes allowed K to test out food groups while still eating a well-balanced diet. The culprit for K is fructans from wheat, which is why cutting out daily servings of wheat has made her symptoms disappear.

K is finishing FODMAP testing (still a couple more groups to go), but has had reliable relief from all symptoms without any meds. Potatoes are a regular addition to meals these days.

Below is the blank version of the log she used.

Philosophical Transactions: DECADENT Reader Reports Losing 50 Pounds Eating Buttery, Cheesy Potatoes

May 15, 2025May 16, 2025 slimemoldtimemoldbutter, case study, dairy, diet, food, health, internet science, nutrition, obesity, Philosophical Transactions, potato, science, weight-loss13 Comments

Previous Philosophical Transactions:

This account has been lightly edited for clarity, but what appears below is otherwise the original report as we received it.

Hi Slimes,

I’ve recently wrapped up a year-long weight loss self-experiment. During this time I lost 50 lbs, most of it on a Potatoes + Dairy version of the potato diet.

This corroborates your recent case studies where Potatoes + Dairy caused just about as much weight loss as the standard potato diet. It certainly worked well for me. I found the diet really enjoyable, my meals were always delicious. I didn’t get tired of the potatoes, they remain one of my favorite foods. And there were a few other interesting findings as well, all described below.

I’m a longtime reader of the blog so this is me sending you my report, which you can publish if you like. Please list me as “Cole” (not my real name). I hope you find it helpful.

Background

First, my demographics. I’m a white male American in my early-mid 30s. I’m about 5 feet 11 inches tall, but I have a large frame. While you should feel free to calculate my BMI at any point, I don’t think it’s a very accurate measure of adiposity in my case.

My first baseline is in mid 2022, when I weighed about 220 lbs. I know this because I tried a version of the potato diet at the time and lost about 10 lbs over about 40 days. I wasn’t seriously concerned with my weight at the time, I was mostly just curious about the potato diet and what it feels like “from the inside”. But this turned out to be relevant later on because it let me know that I’m a potato diet responder.

In mid 2022 I was about to start a new job, one that involved a lot of hard work, stress, and late nights, and also a longer commute / a lot more driving than I am used to (I mention this because I’m sympathetic to the hypothesis that obesity is linked to motor vehicle exposure in some way).

I didn’t notice at first, but after starting this new job, I started to gain weight. Around April 2024, I realized that I weighed almost 250 lbs. This was heavier than I had ever been before, and also quite uncomfortable. For anyone who’s never gained 10+ lbs before, let me tell you, it makes everything in your life just a little more difficult, including things like sleeping, and that sucks.

But this crisis turned into an opportunity: I was about to change jobs again, this time to a job with much more reasonable hours and that required almost no driving. I wanted to lose the weight anyways, so I decided to take this opportunity to run a series of diet experiments and investigate some of the findings you’ve presented on the blog.

The Experiment

I began the study on May 12, 2024, with a starting weight of 247.6 lbs. Per previous potato diet experiments, I weighed myself in my underwear every morning for consistency.

To track my weight and my progress, I used a google sheet based on the one you shared from Krinn’s self-experiment with drinking high doses of potassium. I found her columns tracking 7-day average, personal best, and “ratchet” to be pretty helpful. Would recommend for anyone else trying a weight loss self-experiment.

I didn’t start any new exercise habit, though as I mentioned, I did start a new job and was driving less, I no longer had a weekly commute. So it’s possible that some of the weight loss is from “lifestyle changes” but I don’t think it could be much. According to my phone I’ve averaged about 7,000 steps per day the entire time, while gaining the weight and then while losing it.

The self-experiment can be broken into three main phases: the high-potassium brine phase, the Potatoes + Dairy phase, and a short run-out phase at the end.

Potassium

I had already lost some weight on the potato diet in the past, so from the perspective of pure science, starting with the potato diet didn’t seem very interesting. Instead, I figured I would investigate the hypothesis that high doses of potassium are part of the reason the potato diet causes weight loss.

For the first 147 days of the experiment, I tried different high-potassium brines, and lost about 12 lbs.

All brines started with a base of two 591 ml blue Gatorades, mixed in a liter bottle with whatever dry electrolytes or other ingredients I was trying. Potassium was always added as KCl in the form of Nu-Salt.

I tried a wide variety of different brine mixtures, using different amounts of KCl as well as NaCl, sodium bicarbonate (baking soda), magnesium malate, iodine (as Lugol’s 2% solution), and glycine powder. But I don’t think these mixtures are worth reporting individually, because I wasn’t able to seriously distinguish between them. Regardless of the mix, I mostly kept losing weight at a very slow pace.

My impression is that magnesium is important, and that brines with added sodium work better than brines without, but I’m the first to admit that the data isn’t strong enough to back this intuition up. The most I can say is that I seemed to lose weight in kind of a sine-wave pattern, which you can see on the graph. These ups and downs roughly lined up with the 14-day cycles where I tried different brine recipes (i.e. I tried most recipes for 2 weeks), but I might have imagined a pattern where in reality there were just natural fluctuations.

While I originally hoped to get around 10,000 mg a day of potassium from my brine, like Krinn did, this wasn’t possible. I found doses above 6,600 mg/day K hard to drink, so I settled at that dosage, reasoning that Krinn lost weight even at lower doses.

In general, the brines made me feel weird. I sometimes became anxious, sometimes fatigued, sometimes got headaches, and sometimes it did weird things to my sense of smell. I did sometimes feel very energetic, and sometimes it seriously reduced my appetite. Some days I ate almost nothing and had almost no appetite. But even with a clear reduction in my appetite, even when I was eating very little, I didn’t lose much weight. (This itself was kind of striking.)

In terms of results, 12 lbs isn’t nothing. But over 147 days, it’s only about 0.08 lbs lost per day. That’s not very much.

I take this as evidence in favor of the hypothesis that high doses of potassium are part of why the potato diet causes weight loss. Even on only 6,600 mg/day K, I experienced many of the effects of the potato diet (reduced appetite, weird anxiety) and I did lose some weight, though not much.

But I also think my results suggest that potassium may not be enough, and that the “potato weight loss effect” really comes from something like high doses of potassium plus something else in potatoes / with potatoes—maybe high doses of magnesium, maybe sufficient sodium to balance the potassium, etc.

Potatoes & Dairy

The brine seemed to work, but my rate of weight loss was really slow. It seemed like it was time to try the potato diet. In addition to hopefully losing more weight, I saw two benefits.

First, I could compare the effect of the brine directly to the effect of the potato diet, to see if I was already losing weight as fast as I could, or if there was something missing from the formula.

Second, I could test out the success of Potatoes + Dairy. The original potato diet was very strict, but by this point you had already reported a few case studies where people had lost a lot of weight on versions of the potato diet where they also ate various kinds of dairy.

My version of Potatoes + Dairy was decadent. Every meal was potatoes, but I always added as much butter, cheese, and sour cream as I wanted, which was usually a lot. For a while I made a lot of scalloped potatoes, but eventually I got lazy and from that point on I mostly ate baked potatoes or turned old baked potatoes into homefries. I didn’t get tired of this because butter is great.

When I didn’t have time to prepare potatoes, I would have cheese, milk, or ice cream as a snack. Yes, I ate as much ice cream as I wanted, and still lost weight (which is in line with the literature).

In case anyone wants to replicate my approach, my mainstays were:

Kerrygold salted butter, or occasionally Cabot salted butter
Cabot sour cream
Cabot cheeses, especially Cabot Seriously Sharp Cheddar Cheese
Ben & Jerry’s Ice Cream, most often Peanut Butter Cup

Despite this decadence, I lost about 40 lbs more over 187 days.

Looking closer, the weight loss really happened over two spans, one before the 2024 December holidays, and one after. I first lost about 16 lbs over 75 days, gained about 8 of that back during late December and January, then lost about 28 lbs over the next 86 days. At the point of greatest descent (early March 2025), I lost 10 lbs in two weeks.

I wasn’t very strict and I did cheat pretty often. My notes mention times and places that I had pizza, candy, or sometimes burritos. Sometimes I had cheat meals where I would go out to lunch or get hot pot with friends. Sometimes I went on dates, where I ate normal food. This mostly didn’t make a difference as long as I also kept up with the potatoes.

You might think that potatoes are a neutral food, and they just help you survive while your body returns to normal, or something. But my sense is that potatoes actively cause the weight loss. On days where I didn’t prepare potatoes, and mostly just snacked on ice cream and cheese, I didn’t seem to gain much weight back, but I didn’t lose it, either.

This leads to another counterintuitive recommendation: the potato diet can really reduce your appetite, sometimes to the point where you don’t want to eat. But I think that you actually lose more weight on days where you eat potatoes than on days where you don’t eat at all. So if your goal is to lose weight, don’t assume that not eating is a good strategy—eat your taters.

I’m pretty confident that the potato diet was causing the weight loss, in part because I started losing weight right when I switched from brine to potatoes. Also, when I cheated for more than just a meal or two, it was obvious on the graph. Halloween, Thanksgiving week, and the December Holidays stand out in particular. Here’s version of the graph with those days singled out:

My holiday weight re-gain continued well into January because I was travelling and helping to organize some professional conferences, and I wasn’t able to keep up with the potatoes very well. As soon as I got back on potatoes around Jan 20, my weight started dropping again, this time faster than before.

I was pretty surprised when I blew past not only 220 lbs, but 210 lbs. I had thought that 220-210 might be the healthy range for me, and expected the diet to stall out there. But instead I blew past those milestones. Turns out that 220 lbs is at least 20 lbs overweight for me. I had no idea, because I felt pretty healthy at 220, but I guess I had forgotten what it was like to be a normal weight.

Run-Out

I first dropped below 200 lbs on March 20, 2025. Soon after that, my weight started to plateau, never falling much below 200 lbs but showing no signs of increasing.

I also noticed that I suddently started craving foods that weren’t potatoes, something that I hadn’t experienced on the previous 170 days. First I started craving fruit, and the next day, I started seriously craving Mexican food. Soon I was craving broccoli and chocolate.

This made me think that I might have reached a plateau, possibly my “natural” weight. According to BMI I am still “overweight” at < 200 lbs, and I am definitely not “lean”. But I do feel trim, and the girl I’ve been dating keeps putting her hands on my chest and talking about how good I look, so I’ll take this as some evidence that “just under 200 lbs” is a reasonable weight for me.

Because I already seemed to have hit a plateau, I decided to spend the last 31 days on a run-out period to see what would happen as I eased off the diet. During this time I still ate potatoes pretty often, but I started bringing in other foods, and I went whole days without eating any potatoes at all. Somewhat surprisingly, I didn’t gain back the weight as I relaxed the diet.

I do kind of wonder if my weight would have fallen even further if I had remained on Potatoes + Dairy, but the fact that I was developing cravings for other food suggests to me that I had encountered a real state change. It might have been possible to force my weight lower, but the magic of the potato diet is that the weight loss happens without any force. If you start forcing things, you’re back in the territory of restriction diets.

I officially ended the experiment on May 12, 2025, 365 days after I started, weighing 198.8 lbs. This was down from an original high of 247.6 lbs, and my all-time low was 194.4 lbs on April 22nd.

I’ll probably keep eating a diet high in potatoes, since even after several months, I still love them very much (and you wouldn’t believe how much I’ve saved in groceries). But I seem to have reached a plateau and a healthy weight, and also, while potatoes are powerful, they come at a terrible cost (mostly joking but read on).

A Few Things People Should Know

Hair Loss

When you lose a lot of weight very quickly, you often lose some hair. I’d never heard of this before but apparently it’s common knowledge among women. Who knew? It’s called “telogen effluvium” and it definitely happened to me. In early January, after my first period of intense Potato + Dairy weight loss, I noticed my hair was seriously thinning on top and in the back.

The good news is that hair lost in this way usually grows back on its own, though it can take a couple of months. That seems to be happening for me too. My hair is clearly thicker now than it was in January. And it’s pretty weird: looking at my scalp, I can see short hairs and even some very short hairs mixed in among the long ones. While my head hasn’t returned to normal yet, the hair is clearly growing back.

So in the end this doesn’t seem to be a serious concern. And it’s not specific to the potato diet, this just happens when you lose weight really fast. Even so, anyone who wants to copy my results should be aware that this might happen, but also that it’s usually temporary.

Emotional Effects

Some people get really intense negative feelings of fear or anxiety while on the potato diet. This also happened to me.

I’m glad I read Birb’s account of her experience with the potato diet before trying it for myself, because it really prepared me for my own experience. Here’s what she said:

To anyone who wants to do this diet, or is considering it after the benefits I described above: I encourage you to do it, but please be extra cautious that your mental state might be altered and that you are not necessarily in your right mind. The feelings you experience during this diet may not be how you actually feel.

Like I said above, potato diet is fucking weird. I mention this and the above because towards the end of the third week, I found myself crying every day. I was having actual meltdowns… five days in a row.

I am not talking “oh I am so sad, let a single tear roll down my cheek while I stare out of a window on a rainy day” levels of gloom and general depression. I am talking “at one point I couldn’t fold some of my laundry in a way that was acceptable to me, and this made me think I should kill myself, so I started crying”.

Is this a really dark to drop in the middle of a sort of lighthearted post about potato diet? Yes. I am sorry if you are uncomfortable reading it. Personally, I think I have a responsibility to talk about it, because the mentally weird aspect of this diet cannot be stressed enough.

My experience was somewhat different from Birb’s, manifesting more as a sense of overwhelming dread or doom than as a feeling of depression. And unlike Birb, I didn’t start to seriously feel this way until several months into the diet. But I definitely recognize her description.

As far as I could tell, these feelings were somewhat related to how quickly I was losing weight, though maybe not in the way you expect. The faster I was losing weight, the more of an overwhelming sense of doom I felt. Hooray. That said, it wasn’t a very strong relationship. I still felt the doom during times when I was cheating on the diet, and even when I was losing a lot of weight, I sometimes felt ok.

I suspect that these feelings may have something to do with how the body uses epinephrine and norepinephrine to release energy from adipose tissue, which would explain why you feel so crazy anxious and such intense dread when actively losing the most weight, but I’m not a doctor™.

The feelings might also be the result of a vitamin or mineral deficiency. We know that the potato diet is deficient in Vitamin A, and while I wasn’t rigorous about testing this, I found that eating some sweet potatoes (high in vitamin A) often made me feel better. I also found during the run-out period that eating mushrooms (selenium?), broccoli, and spinach (iron?) maybe helped as well. So if you’re having a bad emotional time on the potato diet, think about trying sweet potatoes or one of these other foods.

It’s interesting to me that these feelings of doom got stronger the further along I got in my weight loss. Maybe this is just because I was losing weight faster over time. But another (kind of crazy) possibility is that something is stored in our fat reserves and as I dug deeper into them, I released more of it. Or in general that something is flushed out from somewhere? I don’t know if I believe this but I wanted to mention it.

That’s just my speculation. It could also have been ordinary anxiety from other causes that happened to line up with the weight loss. I’ve got some personal things going on in my life right now, maybe the anxiety is coming from those. Plus, a few friends have recently had similar feelings of dread, and they’re not losing extreme amounts of weight on a highly unusual diet.

Conclusions

My results make me very confident that Potatoes + Dairy works. The potato diet makes you lose weight, and that still works even if you add dairy, including butter and ice cream, no matter if you’re eating as much of it as you want.

While my data can’t speak to how well Potatoes + Dairy will work for anyone else, I hope this ends the idea that the potato diet works because it’s unpalatable. I lost 50 lbs and every meal was delicious. I also hope this finishes the idea that the potato diet works because it’s a “mono diet”. You can’t reasonably call something a mono diet when it includes potatoes, sour cream, and ice cream with tiny peanut butter cups.

I also think this is some evidence for the potassium hypothesis. I lost weight when I was taking high doses of potassium, though not nearly as much as on the potato diet. Maybe this was because I was taking too small of a dose, and a higher dose would have caused a similar amount of weight loss as what I eventually saw on the potato diet.

But I suspect this is because the potato effect doesn’t come from potassium alone, but from an interaction between potassium and something else, possibly other electrolytes like sodium and magnesium.

If you could find the right mixture, maybe you could reproduce the potato effect in a brine. But if so, I wasn’t able to find it. For now, the state of the art is Potatoes + Dairy.

The Mind in the Wheel – Part XII: Help Wanted

May 8, 2025May 7, 2025 slimemoldtimemoldcybernetics, graphic design, paradigm, personality, psychology, science, The Mind in the Wheel1 Comment

[PROLOGUE – EVERYBODY WANTS A ROCK]
[PART I – THERMOSTAT]
[PART II – MOTIVATION]
[PART III – PERSONALITY AND INDIVIDUAL DIFFERENCES]
[PART IV – LEARNING]
[PART V – DEPRESSION AND OTHER DIAGNOSES]
[PART VI – CONFLICT AND OSCILLATION]
[PART VII – NO REALLY, SERIOUSLY, WHAT IS GOING ON?]
[INTERLUDE – I LOVE YOU FOR PSYCHOLOGICAL REASONS]
[PART VIII – ARTIFICIAL INTELLIGENCE]
[PART IX – ANIMAL WELFARE]
[PART X – DYNAMIC METHODS]
[PART XI – OTHER METHODS]

“Alright, gang, let’s split up and search for clues.”

— Fred Jones, Scooby-Doo

This has been our proposal for a new paradigm for psychology.

If the proposal is more or less right, then this is the start of a scientific revolution. And while we can’t make any guarantees, it’s always good to plan for success. So in case these ideas do turn out successful, then: welcome to psychology’s first paradigm, let’s discuss what we do from here.

In looking for a paradigm, we’re looking for new ways to describe the mysteries that pop up on the regular. When a good description arrives, some of those mysteries will become puzzles, problems that look like they can be solved with the tools at hand, that look like they will have a clear solution, the kind of solution we’ll recognize when we see it. Because a shared paradigm gives us a shared commitment to the same rules, standards, and assumptions, it can let us move very quickly.

All that is to say is that if this paradigm has any promise, then there should be a lot of normal science, a lot of puzzle-solving to do. A new paradigm is like an empty expert-level sudoku: there’s a kind of internal logic, but also a lot of tricky blanks that need filling in. So, we need your help. Here are some things you can do.

Experimentation

First, experimentalists can help us develop methods for figuring out how many cybernetic drives people have, what each drive controls, and different parameters of how they work. In the last two sections we did our best to speculate about what these methods might look like, but there are probably a lot of good ideas we missed.

Then, we need people to actually go out and use these methods. The first task is probably to discover all of the different drives that exist in human psychology, to fill out the “periodic table” of motivation as completely as we can. Finding all of the different drives will generate many new mysteries, which will lead to more lines of research and more discoveries.

We will also want to study other animals. There are a few reasons to study animals in addition to humans. First of all, most animals don’t have the complex social drives that humans do. The less social an animal is, the easier it will be to study its non-social drives in isolation. Second, it’s possible to have more control over an animal’s environment. We can raise an animal so that it never encounters certain things, or only encounters some things together. Finally, we can use somewhat more invasive techniques with animals than we can with humans.

Computational Modeling

Computational models will be especially important for developing a better understanding of depression, anxiety, and other mental illnesses. With a model, we can test different changes to the design and parameters, and see which kinds of models and what parameter values lead to the behaviors and tendencies that we recognize as depression. This will ultimately help us determine how many different types of depression there are, come to an understanding of their etiology, and in time develop interventions and treatments.

Computational models should provide similar insight into tendencies like addiction and self-harm. The first step is to show that models of this kind can give rise to behavior that looks like addiction. Then, we see what other predictions the model makes about addictive behavior, and about behavior in general, and we test those predictions with studies and experiments.

If we discover more than one computational model that leads to addictive behavior, we can compare the different models to real-world cases of addiction, and see which is more accurate. Once we have models that provide a reasonably good fit, we can use them to develop new approaches for treatment and prevention.

Biology and Chemistry

Those of you who tend more towards biology or neuroscience can help figure out exactly how these concepts are implemented in our biology. Understanding the computational side of how the mind works is important, but the possible interventions we can take (like treating depression) will be limited if we don’t know how each part of the computation is carried out in an organism.

For example: every governor tracks and controls some kind of signal. The fear governor tracks something like “danger”. This is a complicated neurological construct that probably doesn’t correspond to some specific part of biology. But other governors probably track biological signals that may be even as simple as the concentrations of specific minerals or hormones in the bloodstream.

For example, the hormone leptin seems to be involved in regulating hunger. Does one of the hunger governors act to control leptin levels in our blood? Or is leptin involved in some other part of the hunger-control process? What do the hunger, thirst, sleep, and other basic governors control, and what are their set points?

Biologists may be able to answer some of these questions. Some of these questions may even have already been answered in neuroscience, biology, or medicine, in which case the work will be in bundling them together under this new perspective.

Design

Running studies and inventing better methods sounds very scientific and important, but we suspect the most important contributions might actually come from graphic design.

The first “affinity table” was developed in 1718 by Étienne François Geoffroy. Substances are identified by their alchemical symbol and grouped by “affinity”.

At the head of each column is a substance, and below it are listed all the substances that are known to combine with it. “The idea that some substances could unite more easily than others was not new,” reports French Wikipedia, “but the credit for bringing together all the available information into a large general table, later called the affinity table, goes to Geoffroy.”

Here is a later affinity table with one additional column, the Tabula Affinitatum, commissioned around 1766 for the apothecary’s shop of the Grand Duke of Florence, now to be found in the Museo Galileo:

These old attempts at classification are charming, and it’s tempting to blame this on the fact that they didn’t understand that elements fall into some fairly distinct categories. But chemical tables remained lacking even after the discovery of the periodic law.

Russian chemist Dmitri Mendeleev is often credited with inventing the periodic table, but he did not immediately give us the periodic table as we know it today. His original 1869 table looked like this:

And his update in 1871 still looked like this:

It wasn’t until 1905 that we got something resembling the modern form, the first 32-column table developed by Alfred Werner:

They tried a lot of crazy things on the way to the periodic table we know and love, and not all of these ideas made it. We’ll share just one example here, Otto Theodor Benfey’s spiral periodic table from 1964:

When a new paradigm arrives, the first tools for thinking about it, whether tables, charts, diagrams, metaphors, or anything else, are not going to be very good. Instead we start with something that is both a little confused and a little confusing, but that half-works, and iterate from there.

The first affinity table by Étienne François Geoffroy in 1718 was not very good. It was missing dozens of elements. It contained bizarre entries like “absorbent earth” and “oily principle”. And it was a simple list of reactions, with no underlying theory to speak of.

But it was still good enough for Fourcroy, a later chemist, to write:

No discovery is more brilliant in this era of great works and continued research, none has done more honor to this century of renewed and perfected chemistry, none finally has led to more important results than that which is relative to the determination of affinities between bodies, and to the exposition of the degrees of this force between different natural substances. It is to Geoffroy the elder … that we owe this beautiful idea of the table of chemical ratios or affinities. … We must see in this incorrect and inexact work only an ingenious outline of one of the most beautiful and most useful discoveries which have been made. This luminous idea served as a torch to guide the steps of chemists, and it produced a large number of useful works. … chemists have constantly added to this first work; they have corrected the errors, repaired the omissions, and completed the gaps.

It took about two hundred years, and the efforts of many thousands of chemists, to get us from Geoffroy’s first affinity table to the periodic table we use today. So we should not worry if our first efforts are incomplete, or a little rough around the edges. We should expect this to take some effort, we should be patient.

Better tools do not happen by accident. We do not get them for free — someone has to make them. And if you want, that someone can be you.

That’s all, folks!

Thank you for reading to the end of the series! We hope you enjoyed.

We need your help, your questions, your disagreement. Consider reaching out to discuss collaborating, or to just toss around ideas, especially if they’re ideas that could lead to empirical tests. You can contact us by email or join the constant fray of public discussion on twitter.

If you find these ideas promising and want to see more of this research happen, consider donating. Our research is funded through Whylome, a a 501(c)(3) nonprofit that relies on independent donations for support. Donations will go towards further theoretical, modeling, and empirical work.

The Mind in the Wheel – Part XI: Other Methods

May 1, 2025May 8, 2025 slimemoldtimemoldcybernetics, methods, paradigm, personality, psychology, research methods, science, study design, The Mind in the Wheel7 Comments

There’s a fascinating little paper called Physiological responses to maximal eating in men.

The researchers recruited fourteen men (mean age: 28 years old) and invited them back to the lab to eat “a homogenous mixed-macronutrient meal (pizza)”. The authors note that “this study was open to males and females but no females signed up.”

They invited each man to visit the lab two separate times. On one occasion, the man was asked to eat pizza until “comfortably full”. The other time, he was asked to eat pizza until he “could not eat another bite”.

When asked to eat until “comfortably full”, the men ate an average of about 1500 calories of pizza. But when asked to eat until they “could not eat another bite”, the men ate an average of more than 3000 calories.

The authors view this as a study about nutrition, but we saw it and immediately went, “Aha! Pizza psychology!”

While this isn’t a lot of data — only fourteen men, and they only tried the challenges one time each — it shows some promise as a first step towards a personality measure of hunger and satiety, because it measures both how hungry these boys are, and also how much they can eat before they have to stop.

When asked to aim for “could not eat another bite”, the men could on average eat about twice as much pizza compared to when they were asked to aim for “comfortably full”. But there was quite a lot of variation in this ratio for different men:

All the men ate more when they were asked to eat as much as they could, than when they were asked to eat as much as they liked. But there’s a lot of diversity in the ratio between those two values. When instructed to eat until they “could not eat another bite”, some men ate only a little bit more than they ate ad libitum. But one man ate almost three times as much when he was told to go as hard as he can.

People have some emotions that drive them to eat (collectively known as hunger), and other emotions that drive them to stop eating (collectively known as satiety). While these pizza measurements are very rough, they suggest something about the relationship between these two sets of drives in these men. If nothing else, it’s reassuring to see that for each individual, the “could not eat another bite” number is always higher.

It’s a little early to start using this as a personality measure, but with a little legwork to make it reliable, we might find something interesting. It could be the case, for example, that there are some men with very little daylight between “comfortably full” and “could not eat another bite”, and other men for whom these two occasions are like day and night. That would suggest that some men’s hunger governor(s) are quite strong compared to their satiety governor(s), and other men’s are relatively weak.

The general principle of personality in cybernetic psychology is “some drives are stronger than others”. So for personality, we want to invent methods that can get at the question of how strong different drives are, and how they stack up against each other. Get in loser, we’re making a tier list of the emotions.

We may not be able to look at a drive and say exactly how strong it is, since we don’t yet know how to measure the strength of a drive. We don’t even know the units. When this is eventually discovered, it will probably come from an unexpected place, like how John Dalton’s work in meteorology gave him the idea for the atomic theory.

But we can still get a decent sense of how strong one drive is compared to another drive. This is possible whenever we can take two drives and make them fight.

Some drives are naturally in opposition — this pizza study is a good example. The satiety governor(s) exist specifically to check the hunger governor(s). Hunger was invented to start eating, and satiety was invented to make it stop. So it’s easy to set up a situation where the two of them are in conflict.

Or somewhat easy. We think it’s more accurate to model the pizza study as the interaction between three (groups of) emotions. When asked to eat until “comfortably full”, the hunger governor voted for “eat pizza” until its error was close to zero, then it stopped voting for “eat pizza”, so the man stopped. That condition was simple and mainly involved just the one governor.

The other condition was more complex. When asked to eat until they “could not eat another bite”, the hunger governor first voted for “eat pizza” until its error was close to zero. Then, some kind of “please the researchers” governor(s) kept voting for “eat pizza” to please the researchers.

At some point this started running up against the satiety governor. The satiety governor tracks something like how full you are, so as the man started to get too full, the satiety governor started voting against “eat pizza”. The man kept eating until the vote from the “please the researchers” governor(s) was just as strong as the vote from the satiety governor, at which point the two votes cancel out and the man could not eat another bite.

This reveals the problem. In one sense, hunger and satiety are naturally in opposition. Hunger tries to make you eat enough and satiety tries to make sure you don’t eat enough too much. But in a healthy person, there’s plenty of daylight between the set points of these two drives, and they don’t come into conflict.

Same thing with hot and cold — the drive that tries to keep you warm is in some sense “in opposition” to the drive that tries to keep you from overheating, but they don’t normally fight. If you have a sane and normal mind, you don’t put on 20 sweaters, then overheat, then in a fit of revenge take off all of your clothes and jump in a snowbank, etc. These drives oppose each other along a single axis, but when they are working correctly, they keep the variable they care about in a range that they agree on. Hunger and satiety, and all the paired governors, are more often allies than enemies.

But any two drives can come into conflict when the things they want to do become mutually exclusive, or even just trade off against each other. Even if you can do everything you want, the drives will still need to argue about who gets to go first. Take something you want, anything at all, and put it next to a tiger. Congratulations, fear is now in conflict with that original desire.

Many people experience this conflict almost every morning:

This is actually a more complicated situation, where the governors have formed factions. The pee governor wants to let loose on your bladder. But your hygiene governor votes against wetting the bed. Together they settle on a compromise where you get up and pee in the toilet instead, since this satisfies both of their goals (bladder relief + hygienic).

But the governor that keeps you warm, the sleep governor (who wants to drift back into unconsciousness), and any other governors with an interest in being cozy, strenuously oppose this policy. They want you to stay in your warm, comfy bed. So you are at an impasse until the bladder governor eventually has such a strong error signal — you have to take a leak so bad — that it has the votes to overrule the cozy coalition and motivate you to get up and go to the bathroom.

The point is, the bladder governor, warmth governor, and sleep governor don’t fundamentally have anything to do with each other. They all care about very different things. But when you have to pee in the middle of the night, their interests happen to be opposed. They draw up into factions, and this leads to a power struggle — one so universal that there are memes about it. And as is always the case in politics, a power struggle is a good chance to get a sense of the relative strength of the factions involved.

If you met someone who said they didn’t relate to this — they always get up in the middle of the night to pee without any hesitation or inner struggle — this would suggest that their bladder governor is very strong, or that their warmth and/or sleep governors are unusually weak. Whatever the case, their bladder governor wins such disagreements so quickly that there doesn’t even appear to be a dispute.

In contrast, if your friend confesses that they have such a hard time getting up that they sometimes wet the bed, this suggests that their bladder governor, and probably their hygiene governor, are unusually weak compared to the governors voting for them to stay in bed.

To understand these methods, we have to understand the difference between two kinds of “strength”.

In general when we say that a drive is strong, we mean that it can meet its goals, it can vote for the actions it wants. This is why we can learn something about the relative strength of two drives by letting them fight — we can present the organism with mutually exclusive options (truth or dare?) and see which option it picks. If we have some reasonable idea which drive would pick which option, we know which drive is stronger from which option is picked.

However! Another way a drive can be strong is that it can have a big error signal in that moment. If you are ravenously hungry, you will eat before anything else. If you are in excruciating pain, you will pull your hand off the stove before doing anything else. This kind of urgency tells us that the current error is big, but it doesn’t tell us much about the governor.

A drive does get a stronger vote when its variable is further off target. But it’s also true that for a given person, some drives seem stronger in all situations.

The normal sense of strength gets at the fact that a governor can be stronger or weaker for a given error. Some people can go to sleep hungry without any problem. For other people, even the slightest hint of appetite will keep them awake. When we talk about someone being aggressive, we mean that they will drop other concerns if they see a chance to dominate someone; if we talk about someone being meek, we mean the opposite.

The current strength of any drive is a function of the size of its current error signal and the overall strength or “weight” of the governor. Unfortunately, we don’t know what that function is. Also, it might be a function of more than just those two things. Uh-oh!

Ideally, what we would do is hold the size of the error constant. If we could make sure that the error on the salt governor is 10 units, and the error on the sweet governor is 10 units, then we could figure out which governor is stronger by seeing which the person would choose first, skittles or olives. This is based on the assumption that the strength of the vote for each option is a combination of the size of the errors and the strength of the governor itself. Since in this hypothetical we know that the strength of the errors is exactly the same, the difference in choice should be entirely the result of the difference in the strength of the governors.

Unfortunately we don’t know how to do that either. We don’t know how to measure the errors directly, let alone how to hold the size of the errors constant.

But we can use techniques that should make the size of some error approximately constant, and base our research on that. The closer the approximation, the better.

The important insight here is that even when we can’t make measurements in absolute terms, we can often make ordinal comparisons. “How strong is this drive” is an impossible question to answer until we know more about how strength is implemented mechanically, but we can make very reasonable guesses about which of two drives is stronger, what order their strengths are in, i.e. ordinal measurements.

We can do this two ways: we can compare one of your drives to everyone else’s version of that same drive, or we can compare one of your drives to your other drives.

Compare One of Your Drives to Everyone Else’s Version of that Same Drive

The first is that we can compare one of a person’s drives to the same drive in other people.

It’s reasonable to ask if your hunger, fear, pain, or shame drive is stronger or weaker than average. To do this, we can look at two or more individuals and ask if the drive is stronger for one of them or for the other.

This will offer a personality measure like: your salt governor is stronger than 98% of people. You a salty boy.

Again, to get a measure of strength, we need to make everyone’s errors approximately constant. One way we can make errors approximately constant is by fully satisfying the drive. So if we identify a drive, like the drive for salt, we can exhaust the drive by letting people eat as much salt or salty food(s) as they want. Now all their errors should be close to zero. Then we can see how long it takes before they go eat something salty again. If someone goes to get salty foods sooner, then other things being equal, this is a sign that their salt governor is unusually strong.

This won’t be perfectly the same, and other things will not be perfectly equal. Some people’s salt error may increase more quickly than others’, like maybe they metabolize salt faster, or something. So after 5 hours without salty foods, some people’s error may be much bigger than others’. But it should be approximately equal, and certainly we would learn something important if we saw one guy who couldn’t go 10 minutes without eating something salty, and someone else who literally never seemed to seek it out.

When we say things like, “Johnnie is a very social person. If he has to spend even 30 minutes by himself he gets very lonely, so he’s always out and spending time with people. But Suzie will go weeks or even months without seeing anyone,” this is a casual version of the same reasoning, and we think it’s justified. It may not get exactly at the true nature of personality, but it’s a start.

When we figure out what the targets are for some governors, we’ll be able to do one better. For example, let’s imagine that we find out that thirst is the error for a governor that controls blood osmolality, and through careful experimentation, we find out that almost everyone’s target for blood osmolality is 280 mOsm/kg. Given the opportunity, behavior drives blood osmolality to 280 mOsm/kg and then stops.

If we measure people’s blood osmolality, we can dehydrate them to the point where their blood osmolality is precisely 275 mOsm/kg. We know that this will be an error of 5 mOsm/kg, because that’s 5 units less than the target. Then we would know almost exactly what their error is, and we could estimate the relative strength of their thirst governor by measuring how hard they fight to get a drink of water.

On that note, it’s possible that a better measure than time would be effort. For example, you could take a bunch of rats and figure out the ideal cage temperature for each of them. Separately, you teach them that pushing a lever will raise the temperature of their cage by a small amount each time they press it.

Then, you set the cage temperature 5 degrees colder than they prefer. This should give them all errors of similar magnitude — they are all about 5 degrees colder than they’d like. Then you give them the same lever they were trained on. But this time, it’s disconnected. You count how many times they press the lever before they give up. This will presumably give you a rough measure of how much each rat is bothered by being 5 degrees below target, and so presumably an estimate of the strength of that governor. If nothing else, you should observe some kind of individual difference.

Compare One of Your Drives to Your Other Drives

The second approach is to ask how your drives compare to each other, basically a ranking. We can look at a single person and ask, in this person, is drive A stronger than drive B?

The main way to do this is to give the person a forced choice between two options, one choice that satisfies governor A, and the other that satisfies governor B. This doesn’t have to be cruel — you can let them take both options, you just have to just make them choose which they want to do first.

This would offer a personality measure like: you are more driven by cleanliness than by loneliness, which is why you keep blowing off all your friends to stay in and scrub your toilet.

There are some drives that make us want to be comfortable and other drives that make us want to be fashionable; there are at least some tradeoffs between comfort and fashion; if you reflect on each of the people in your life, it’s likely that you already know which coalition of drives tends to be stronger in each person.

Every time you see someone skip work to play videogames, refuse to shower even when it ruins all their friendships, blow up their life to have an affair with the 23-year-old at the office, or stay up late memorizing digits of pi, you are making this kind of personality judgment implicitly. People have all kinds of different drives, and you can learn a lot about which ones are strongest by seeing which drives are totally neglected, and which drives lead people to blithely sacrifice all other concerns, as though they’re blind to the consequences.

The Bene Gesserit, a sect of eugenicist, utopian nuns from the Dune universe, use a simplified version of this method in their famous human awareness test, better known as the gom jabbar. Candidates are subjected to extreme pain and ordered not to pull away, at penalty of taking a poisoned needle in the neck. In his success, Paul demonstrates that some kind of self-control governor is much stronger than his pain governor, even when his pain error is turned way up.

*“What’s in the box?” “A personality test.”*

But no shade to the Bene Gesserit, this is not a very precise measure. By turning the pain governor’s error extremely high, they can show that a candidate has exceptional self-control. But this doesn’t let them see if self-control is in general stronger than pain, because the error gets so huge. To compare the strength of governors, you ideally want the error signals to be as similar as possible.

As before, the best way to get at strength is to take two drives, try to make their errors as similar as possible, and then see which drive gets priority. Other things being equal, that drive must be stronger.

When we were trying to compare personality between people, this was relatively easy. If nothing else, we were at least looking at the same error. We can’t get an exact measure of the error, but we could at least say, both of these people have gone 10 hours without eating, or 20 hours without sleep, or are ten degrees hotter than they find comfortable. These are the same kinds of things and they are equal for both people.

But to compare two governors within a single person, we are comparing two different errors, and we have no idea what the units are. So it may be hard to demonstrate differences between the strength of the governors when those differences are small. If one error is ten times stronger than the other, then we assume that the governor behind that error will win nearly all competitions between the two of them. If one error is 1.05 times stronger than the other, that governor has an edge, but will often get sidelined when there are other forces at play.

But like the common-sense examples above, it should be possible to make some comparisons, especially when differences are clear. For example, if we deprive a person of both sleep and food for 48 hours (with their consent of course), then offer them a forced choice between food and sleep, and they take the food, that suggests that their drive to eat may be stronger than their drive to sleep. This is especially true if we see that other people in the same situation take the option to sleep instead.

If we deprive the person of sleep for 48 hours and food for only 4 hours, and they still choose the food over sleep, that is even better evidence that their drive to eat is stronger than their drive to sleep, probably a lot stronger.

While these methods are designed to discover something inside an individual person, they might also shed some light on personality differences between people. For example, we might find that in most people, the sugar governor is stronger than the salt governor. But maybe for you, your salt governor is much stronger than your sugar governor. That tells us something about your personality in isolation (that one drive is stronger than another), and also tells us something about your personality compared to other people (you have an uncommon ordering of drives).

Return to Pizza Study

The pizza study is interesting because it kind of combines these techniques.

Each person was compared on two tasks — “comfortably full” and “could not eat another bite”, which gives us a very rough sense of how strong their hunger and satiety governors are. If you ate 10 slices to get to “comfortably full” and only 12 slices to get to “could not eat another bite”, your satiety governor is probably pretty strong, since it kicks in not long after you ate as much as you need. (There could be other interpretations, but you get the gist.)

In addition, each person can be compared to all the other people. Some men could eat only a little more when they were asked to get to “could not eat another bite”. But one man ate almost three times as much as his “comfortably full”. This man’s satiety governor is probably weaker than average. There are certainly other factors involved, but it still took a long time before that governor forced him to stop eating, suggesting it is weak.

A final note on strength. The strength of a governor is probably somewhat innate. But it may also be somewhat the result of experience. If someone is more motivated by safety than by other drives, some of that may be genetic, but some of that may be learned. It would not be ridiculous to think that your mind might be able to tune things so that if you have been very unsafe in your life, you will pay more attention to safety in the future.

Even the part that’s genetic (or otherwise innate) still has to be implemented in some specific way. When one of your governors is unusually strong, does that governor have a stronger connection to the selector? Does it have the same connection as usual, but it can shout louder? Does it shout as loud as normal, but it can shout twice as often? We don’t know the details yet, but keep in mind that all of this will be implemented in biology and will include all kinds of gritty details.

Deeper Questions

People can differ in more ways than just having some of their drives be stronger than others. For example, some people are more active than other people in general, more active for every kind of drive. They do more things every single day.

Some people seem to get more happiness from the same level of accomplishment. For some people, cooking dinner is a celebration. For others, routine is routine.

Some people seem more anxious by default. Even a small thing will make them nervous.

These seem like they might be other dimensions on which people can differ, and they don’t seem like they are linked to specific governors.

Studying the strength of the governors is nice because the governors are all built on basically the same blueprint, so the logic needed to puzzle out one of them should mostly work to puzzle out any of the others. The methods used to study one governor should work to study all of them, only minor tweaks required. If you find techniques to measure the strength of one governor, you should be able to use those techniques to measure the strength of any governor.

But other ways in which people differ seem more idiosyncratic. They are probably the result of different parameters that tune features that are more global, each of which interacts with the whole system in a unique and different way. So we will probably need to invent new methods for each of them.

That means we can’t yet write a section on the different methods that will be useful. These methods still need to be invented. And we might only get to these methods once we have learned most of what there is to know about the differences in strength between the governors, and have to track down the remaining unexplained differences between people. But we can give a few examples to illustrate what some of these questions and methods might look like.

Learning

Every governor has to have some way of learning which behaviors increase/decrease their errors. We don’t know exactly how this learning works yet, but we can point to a few questions that we think will be fruitful.

For example, is learning “both ways”?

The hot governor (keeps you from getting too hot) and the cold governor (keeps you from getting too cold) both care about the same variable, body temperature. Certainly if you are too cold and you turn on a gas fireplace, your cold governor will notice that this corrects its error and will learn that turning on the gas fireplace is a good option. So when you get too cold in the future, that governor will sometimes vote for “turn on the gas fireplace”.

But what if you are too hot and you turn on the gas fireplace? Well, your hot governor will notice that this increases its error, and will learn that this is a bad option, which it will vote against if you’re in danger of getting too hot.

What does your cold governor learn in this situation? Maybe it learns the same thing your hot governor does — that the gas fireplace increases temperature. The hot governor thinks that’s a bad outcome, but the cold governor thinks it’s a good outcome. If so, then next time you are cold, the cold governor might vote for you to turn on the gas fireplace.

But maybe a governor only learns when its error is changed. After all, each governor only really cares about the error it’s trying to send to zero. And if that error isn’t changed, maybe the governor doesn’t pay attention. If the error is very small, maybe that governor more or less turns off, and stops paying attention, to conserve energy. Then it might not do any learning at all.

If this were the case, the cold governor shouldn’t learn from any actions you take when you’re too hot, even when these actions influence your body temperature. And the hot governor shouldn’t learn from anything you do when you’re too cold, same deal.

You could test this by putting a mouse in a cage that is uncomfortably hot, and that contains a number of switches. Each switch will either temporarily increase or temporarily decrease the temperature of the cage. With this setup, the mouse should quickly learn which switches to trip (makes the cage cooler) and which switches to avoid (makes the cage even more uncomfortably hot).

Once the mouse has completely learned the switches, then you make the cage uncomfortably cold instead, and see what happens. If the cold governor has also been learning, then the mouse should simply invert its choice of switches, and will be just as good at regulating the cage temperature as before.

But if the cold governor wasn’t paying close attention to the hot governor’s mistakes, then the mouse will have to do some learning to catch up. If the cold governor wasn’t learning from the hot governor’s mistakes at all, then the mouse will be back at square one, and might even have to re-learn all the switches through trial and error.

We definitely might expect the former outcome, but you have to admit that the latter outcome would be pretty interesting.

The Model of Happiness

Or consider the possibility that happiness might drive learning.

This would explain why happiness exists in the first place. It’s not just pleasant, it’s a signal to flag successful behavior and make sure that it’s recorded. When something makes you happy, that signals some system to record the link between the recent action and the error correction.

This would also explain why it often feels like we are motivated by happiness as a reward. We aren’t actually motivated by happiness itself, but when something has made us happy, we tend to do it more often in the future.

Previously we said that happiness is equal to the change in an error. In short, when you correct one of your errors, that creates a proportional amount of happiness. This happiness sticks around for a while but slowly decays over time.

That’s a fine model as a starting point, but it’s very simple. Here’s a slightly more complicated model of happiness, which may be more accurate than the model we suggested earlier. Maybe happiness is equal to the reduction in error times the total sum of all errors, like so:

happiness = delta_error * sum_errors

If happiness is just the result of the correction of an error, then you get the same amount of happiness from correcting that error in any circumstance. But that seems a little naïve. A drink of water in the morning after a night at a five-star hotel is an accomplishment, but the same drink of water drawn while hungry and in pain, lost in the wilderness, is a much greater feat. Remembering the strategy that led to that success might be more important.

If you multiply the correction by the total amount of error, then correcting an error when you are in a rough situation overall leads to a much greater reward, which would encourage the governors to put a greater weight on successes that are pulled off in difficult situations. If you correct an error when all your other errors are near zero, you will get some happiness. But if you are more out of alignment generally — more tired, cold, lonely, or whatever — you get more happiness from the same correction.

This might explain fetishes. Why do so many sexual fetishes include things that cause fear, pain, disgust, or embarrassment? Surely the fear, pain, disgust, and embarrassment governors would vote against these things.

We have to assume that the horny governor is voting for these things. The question is, why would it vote for anything more than getting your rocks off? Why would an orgasm plus embarrassment be in any way superior to an orgasm in isolation?

If learning is based on happiness rather than raw reduction in error, then governors will learn to vote for things that have caused past happiness.

And if happiness is a function of total error, not just correction in the error they care about, governors will sometimes vote for things that increase the total error just before their own error is corrected.

The point is, if happiness is a function of total error, governors will actually prefer to reduce their errors in a state of greater disequilibrium. This doesn’t decrease their error any more than in a state of general calm, but it does lead to more happiness, greater learning, and so they learn to perform that action more often. And in some cases they will actually vote to increase the errors of other governors, when they can get the votes.

The horny governor only cares about you having an orgasm. But since it learns from happiness, not from the raw correction in its error, it has learned to vote for you to become afraid and embarrassed just before the moment of climax, because that increases your total error, which increases happiness. And since the horny governor has the votes, it overrules the governors who would vote against those things.

We don’t know how to quantify any of the factors involved, so we can’t test precise models. There are probably constants in these equations, but we can’t figure those out either, at least not yet.

But we can still make reasonable tests of general classes of models. We can make very decent guesses about whether or not something is a function of something else, and we can probably figure out if these relationships are sums or products, whether relationships are linear or exponential, and so on. For example:

happiness = delta_error

This is the original model we proposed, and it’s the most simple. In this case, happiness is caused when an organism corrects any error, and the amount of happiness produced is a direct function of how big of an error was corrected. Eating a cheeseburger makes you happy because, assuming you are hungry, it corrects that error signal. The cheeseburger error.

Not shown in that equation is the kind of relationship. Maybe it’s linear, but maybe it’s exponential. Does eating two cheeseburgers cause more than twice as much happiness as eating one?

This very simple model has the virtue of being very simple. And it seems like it lines up with the basic facts — eating, sleeping, drinking, and fucking do tend to make us happy, especially if we are quite hungry, tired, thirsty, or horny.

But we should also think about more complex models and see if any of them are any better. For example:

happiness = delta_error * product_errors

In this case, the correction in an error is multiplied not by the sum, but by the product of all other errors. So eating a cheeseburger while tired and lonely will be much more pleasurable than eating a cheeseburger while merely tired or merely lonely.

This seems pretty unlikely just from first glance. If happiness were dependent on the product of your other errors, that seems like it would be pretty noticeable, because the difference between correcting an error while largely satisfied and largely unsatisfied would be huge and thus obvious. But this is also something that you could test empirically and maybe there could be some kind of truth to it.

Is this a better model? Not entirely clear, but it certainly makes predictions that can be compared to parts of life we’re familiar with, and it can be tested empirically. That’s a pretty good start.

Or another example:

happiness = delta_error / sum_errors

Instead of multiplying the correction to produce happiness, this time we tried dividing it. In this case, happiness is smaller when the total amount of error is bigger. So correcting the same error leads to less happiness if you’re more out of alignment.

This one seems right out. The joy we get from a cup of hot chocolate is greater when we are lonely, not less. Living in extremis seems like it should only magnify the satisfaction of our experiences. It’s possible that this doesn’t stand up to closer inspection, but people certainly find the idea intuitive:

Finally, one more example. You remember this equation from the learning and memory section above:

Another model of happiness is that happiness is proportional to the TD error in the equation above, or the equivalent in whatever system our brain really uses. The TD error is the difference between the current and projected outcome of the action and the expected outcome of the action. So in this model, we get happiness when something corrects an error by more than the governor expects.

Having an especially great sandwich for the first time feels great. This is because you didn’t know how good it would be. But having the same sandwich for the 100th time isn’t as good, even if it corrects the same amount of error. This is because you anticipated it would be that good, so there’s no TD error. In fact, if the sandwich hits the spot less than usual, you’ll be disappointed, even if it’s still pretty good.

In this model, you’d expect that doing the same enjoyable stuff over and over wouldn’t keep you happy for very long. You’d have to mix it up and try new things that correct your errors.

This model does seem to capture something important. But that said, in real life correcting a big enough error usually creates some happiness. So happiness doesn’t seem like it could be entirely based on how unexpected the correction is. Some amount of happiness seems to come from any correction. But it does seem like more unexpected corrections usually make us more happy.

So this is an example of how we can test general models, even before we can make precise measurements. We can think about classes of models, bring them to their limits, ask how the implications of these models compare to other things we already know about life and happiness, things we experience every day.

Just thinking of these questions mechanically, thinking of them as models, prompts us to ask questions like — What is the minimum amount of happiness? Can happiness only go down to zero, or can there be negative happiness? Is there a maximum amount of happiness? Even if a maximum wasn’t designed intentionally, surely there is some kind of limit to the value the hardware can represent? Can you get happiness overflow errors? What is the quantum of happiness? What are the units? — questions that psychologists wouldn’t normally ask.

[Next: HELP WANTED]

The Mind in the Wheel – Part X: Dynamic Methods

April 24, 2025May 1, 2025 slimemoldtimemoldcybernetics, methods, paradigm, psychology, research methods, science, study design, The Mind in the Wheel12 Comments

Since behavioral feedback of any significance is always negative, it follows that there will always be a tendency to move toward a zero-error condition calling for no effort, and (if clever enough) one will always be able to discover the reference condition. By the same token, one will always be able to discover what the subject is controlling, for if disturbances are applied that do not in fact disturb the controlled aspect of the environment, the subject’s behavior will not oppose the disturbance. Only when one has found the correct definition will the proposed controlled quantity be protected against disturbance by the subject’s actions.

— William Powers, Behavior: The Control of Perception

What we wrote in the previous parts is only a start. Here are the things we need to figure out next.

First, we should try to discover all the basic drives of human psychology. We should learn about their error signals, which we identify as emotions.

When possible, we should also figure out what signal each governor is actually controlling, and the target it is controlling that signal towards. It’s a good start to know that there is a drive with an error we know as thirst, but it would be better to confirm that thirst is the error of a governor controlling blood osmolality. And it would be even better to then confirm that this governor controls blood osmolality towards a target of 280-295 mOsm/kg (or perhaps some biological proxy of that target).

For example, we may find that there is a hunger governor controlling the hormone leptin, a tiredness governor controlling the hormone melatonin, and so on. The answers we find probably won’t be quite that simple, but we’re looking for something along these lines.

We should also try to characterize signals like happiness and curiosity, which don’t seem to be errors from a control system (if nothing else, they aren’t actively driven towards zero!), but do seem to be important signals that interact with the other drives and with motivation in other ways.

Second, we should try to discover the parameters that tune the governors. It’s clear that some governors can be “stronger” than others, and that these patterns of strength and weakness differ between different people. People are more or less brave, more or less neat and clean, etc. We’d like to find out what it means, in a precise sense, for one governor to be stronger than another.

We’d like to know whether parameters are individual to each governor, or global to all of them, or if there are some of both. For example, we’d like to know if each governor has an individual parameter that adjusts how it balances exploration vs. exploitation, or if there is an explore/exploit parameter that influences all the governors globally.

One of our long-term goals is to find ways of measuring these parameters for each person. For example, we might want to ask if someone’s fear governor is stronger than their thirst governor, perhaps even how much stronger. This will give us the start of a true measure of personality.

Third, we will want to discover the laws of what’s known as selection, the detailed parliamentary procedure and rules that control how the governors vote on actions.

As before, there will be parameters that adjust these laws, and make people different from one another. Learning how to measure these parameters will give us an even stronger theory of personality.

Fourth, as we develop a better understanding of the drives, the governors, and the laws that dictate their behavior, we can start working to characterize well-known behaviors in terms of these governors and their parameters.

Here are some things we might be able to understand in terms of this new paradigm: personality, anxiety, depression, personality disorders, possibly other psychiatric disorders, self-harm, high-risk behavior, drugs, and addiction.

If cybernetic principles lead to models that have natural outcomes that look just like anxiety, depression, addiction, etc., that will establish the promise of this approach. Then we can look at the points where the models fail, consider alternative models, refine the approach, and make the models even better.

But this last project is kind of “for the rest of time”. If building the paradigm is successful, people can spend the next few hundred years applying it. But first we have to build it.

1. Considerations

First, a few considerations, issues that might come up when trying to discover the drives.

1.1 Are Emotions Constructed?

One of the questions academics keep asking about emotions is whether or not they are “culturally constructed”.

This may seem like a weird question, but to people on the inside of academic psychology, it’s a major topic.

But we’re not here to revisit those debates, we’re here to put them to rest. The cybernetic perspective gives a very clear answer to the question of whether or not emotions are constructed: yes, and no.

All emotions are biologically hard-wired, because they are the error signals from our most fundamental drives, all of which are necessary for survival. These are not at all constructed. While we don’t yet know the details, we understand that at some level they are physically distinct from each other, controlling different biological signals towards different set points.

But emotion categories are culturally constructed. There are a huge number — dozens, maybe hundreds, maybe even thousands — of individual emotions, but we don’t have a word for each of them. Instead we group them together in ways that make practical sense for the needs of our culture.

As usual, hunger is a good example. We treat hunger as if it is just one signal, when in fact hunger is easily a dozen different emotions, maybe more. But because these emotions are all addressed by similar actions (stuffing something in your maw) most languages treat them as one thing.

We can unpack our emotion words when we need to — we can talk about craving salt, or talk about specific cravings that come from this drive, like craving pickles. We can say things like, “I’m stuffed but I’m still hungry!” and so on. But the hunger drives are closely intertwined most of the time, so most languages don’t make any serious distinction between them.

Desert mice almost never drink water; they get almost all their water from their food, from eating seeds. So if desert mice developed a language, they would probably come up with a single word that meant both hungry and thirsty. In their experience, hunger and thirst are addressed by one action, eating seeds, and it’s more useful to combine these ideas than to keep them separate.

No group of humans is as extreme as the desert mouse — but still, we do wonder if there are cultures where people get most of their water from their food, and if those cultures would bother to distinguish between hunger and thirst, or if they would have one word covering both.

1.2 Redundancy

Basic needs, especially needs that are critical to our survival, are probably supported by more than just one drive.

Elevators are designed not only to support the weight they were designed to carry, but to support many times that weight, and they have multiple brakes and other failsafes in case of crisis. If one brake or failsafe malfunctions, the others kick in to prevent disaster.

For the same reason, we should expect drives to be redundant, sometimes massively redundant. Humans tend to create systems that are highly efficient but fragile. But nature tends to create systems that are inefficient but resilient. If an animal has only one drive that tells it to eat, then if anything goes wrong with that drive, it will die. Better to have multiple drives, so that the animal is able to survive even if it is born with a surprise mutation or gets an unexpected brain injury.

The more important a need is to survival, the more likely it is that there will be built-in redundancy. A need that is critically important may be supported by not one governor but by many separate governors that all control different measures of the same need.

2. Observational Methods

One of the most foundational projects is to discover the list of drives and emotions. Above anything else, we should figure out how many different drives there are, and do our best to identify each of them.

We can do this in two ways. We can use methods that are observational (looking at historical data, case studies, etc.) and methods that are empirical (let’s collect some data). Let’s look at these methods one at a time, starting with observational methods.

2.1 Pure Observation

We can draw a lot of reasonable conclusions about the list of drives based on our everyday experiences of what it’s like to be human, and what we know about what it takes to survive.

For example, we know that people have drives that lead to hunger and pain because we all experience those emotions, and it’s clear that they motivate our behavior. Most behaviors you encounter can be explained in terms of a basic drive.

Drives aren’t linked directly to each specific behavior, of course. There isn’t a drive to watch operas, or to play shuffleboard. For one thing, those options didn’t exist for our ancestors. People are probably driven to do these things because of some kind of general social emotions. But any behavior that can’t be explained in terms of a known basic drive may point to a basic drive of its own.

For example, it seems possible that humans have a basic drive to look at animals. As strange as this might sound, we go to great lengths to look at animals, even in private when no one else is around, even when no one is watching, and it seems like we are driven to this for no other reason than to look at them. It’s hard to explain these behaviors in terms of another drive, so the drive to look at animals may itself be basic.

Think about all the time, space, and money we spend on zoos. Think of how we plaster the walls of our kindergartens with pictures of lions. Think of how many hours you personally have spent watching nature documentaries, or animal videos on YouTube.

Before dog people got online, everyone knew that cat pictures ruled the internet. Animal pictures still rule the internet. As of this writing, the subreddit r/aww (mostly pictures of animals) is the 6th largest subreddit, with 37 million members. This may also be why people get pets in the first place, so they have animals to look at whenever they want.

If the desire to look at animals is a drive, then it should be homeostatic and conserved; you should want to go to the zoo for a while, then you should be ready to go home. If we keep you from going to the zoo, you will look at geese in the park instead. And if we keep you from looking at any animals at all, you may eventually become nearly frantic with your desire to do so, especially if this drive is unusually strong in you.

Games like 2048 and Candy Crush suggest that there might be some kind of drive that causes sorting behavior, though maybe this is just an unusual manifestation of a drive for decorating or cleaning your environment.

Like, let’s draw out how weird it is that people play these games. What the fuck is going on? Why is it so engrossing to watch two little blocks labeled “2” combine to form a block labeled “4”? People will do this for hours. It sounds so dumb, and yet when you’re on a plane sitting behind someone playing this on their seatback screen, you can’t look away.

When we find something extremely engrossing, it might be because it has concentrated the exact thing our drive is trying to control. If the drive here is something like “sorting”, there aren’t many naturally-occurring situations where you’re only sorting. But a game can provide you with pure, unadulterated sorting. (Compare: superstimuli.)

Another unlikely drive is some kind of drive to dig holes. The strongest evidence for this is in hobby tunneling, where people wake up one day and start digging vast networks of tunnels, usually for no apparent reason. They often do their digging in secret, and they’ll keep doing it even if there is a social or material cost, even when it’s forbidden. This suggests that it’s not done for social reasons, but in fact is done in spite of them.

What else could explain the incredible popularity of Minecraft? Why would children flock to a game about digging, instead of a game about anything else? As they say, the children yearn for the mines.

When it is hard for us to do an activity itself, watching the activity can sometimes serve as an acceptable substitute. In this way, a drive for excavation might explain what the Italians call umarell. You’ve probably seen them — old men who spend their days watching construction sites, especially dig sites, standing there entranced with their hands clasped behind their back. This is enough of a universal across time and space that Jerry Seinfeld even has a bit about it.

Of course, these Italian men are so old-fashioned. Today the boys get all their construction watching on TikTok:

There may even be a drive to seek out weapons, expressed especially strongly in boys. If you have ever been a boy, or spent any time around boys, this will probably sound familiar. Check out this passage from the Cyropaedia, a 370 BC biography of Cyrus the Great:

And to-day a battle is before us where no man need teach us how to fight: we have the trick of it by nature, as a bull knows how to use his horns, or a horse his hoofs, or a dog his teeth, or a wild boar his tusks. The animals know well enough,” he added, “when and where to guard themselves: they need no master to tell them that. I myself, when I was a little lad, I knew by instinct how to shield myself from the blow I saw descending: if I had nothing else, I had my two fists, and used them with all my force against my foe: no one taught me how to do it, on the contrary they beat me if they saw me clench my fists. And a knife, I remember, I never could resist: I clutched the thing whenever I caught sight of it: not a soul showed me how to hold it, only nature herself, I do aver. I did it, not because I was taught to do it, but in spite of being forbidden, like many another thing to which nature drove me, in spite of my father and mother both. Yes, and I was never tired of hacking and hewing with my knife whenever I got the chance: it did not seem merely natural, like walking or running, it was positive joy.

Consider this collection, and what could possibly have driven someone to put it together with such care:

People seem stuck on the idea that complex behaviors like digging or pretending a cool stick is a weapon couldn’t possibly be innate. But obviously they can be. Breeds of dogs whose ancestors were bred to herd animals, will herd animals without having to be taught. Spiders spin webs. People usually become attracted to adult members of the same species, rather than becoming attracted to furniture or the moon. If evolution has enough discretion to latch our sexual drives onto reasonable targets most of the time, then surely it can latch other drives onto other complex targets, like a stick that reminds you of an AK-47.

While we can see evidence of these drives as they express themselves in specific kinds of behavior, we don’t immediately know what is actually being controlled. A drive to dig might be implemented as something like a drive to smell freshly-turned earth, because in general that target would lead to digging behavior. You could imagine how tangential behaviors, like gardening, might be other, confused results of this drive.

2.2 Ecological

We can also draw some reasonable conclusions from our understanding of biology.

All of our psychological drives were put into us by evolution to help keep us alive. So generally speaking, we should find in ourselves at least one drive (and matching emotion) for each thing that we need to stay alive, and at least one drive for all the things that have been necessary to be evolutionarily successful.

You need to eat to stay alive, which is another reason to expect at least one drive for hunger. You don’t need sex to stay alive, but the species does need a sex drive to go on being a species at all, which is why evolution made us horny. Things that are necessary for survival (like breathing and sleeping) must be backed up by drives.

However, there are a few drives that are conspicuously missing — we don’t have quite every drive we need. See the example of scurvy, the horrible disease caused by a deficiency of vitamin C. You might think that people suffering from scurvy would seek out foods that contain the thing they lack, but as far as we know they don’t crave lemons or cabbage, which is why the cure took so long to discover. Vitamin C is necessary for survival, but people don’t appear to have a drive to seek it out.

There seem to be two main reasons we lack this drive.

First, most foods contain at least a little vitamin C, so most of our ancestors would have survived just fine without a drive telling them to seek it out. If you eat any kind of normal diet, you’ll end up with plenty of vitamin C by default. Only in very weird situations where you get no fresh food at all, like being a 15th century mariner or an arctic explorer, does this become a problem.

Second, this is a specific case where humans happen to be very unusual. We are one of the very small number of animals that can’t synthesize our own vitamin C, which is why we need to find it in our food. Most animals don’t need to consume any vitamin C, they make their own, so most animals would have no need for vitamin C drive at all.

We probably inherit most of our drives from designs that are common to all mammals, and since the default mammal package doesn’t include a drive for vitamin C (because most mammals make their own), humans would have had to evolve such a drive from scratch. But given that vitamin C is so abundant in everything we normally eat, it’s easy to imagine why we didn’t bother.

We have a vegetarian friend who used to struggle with random fatigue and low energy. Then he tried taking vitamin B12, and immediately felt a huge difference. But he didn’t seem to crave foods high in B12 before, suggesting that vitamin B12 also lacks a governor, despite being an essential nutrient.

This may be a common feature of many vitamins — in fact, it may be part of what it means for us to call something “a vitamin”. Most vitamins were discovered by people trying to cure diseases of deficiency, where people weren’t getting enough of the vitamin. It’s hard to develop a deficiency of something you have a drive for — the deficiency and the cure will be really obvious, since you’ll develop cravings. If you have a drive for some substance, it will be hard to develop a deficiency, so it may not be classified as a vitamin.

Some essential minerals probably have governors, but others may not, and it’s not entirely clear which is which.

But there will be signs. If you have a drive for a mineral, it should be pretty hard to develop a deficiency in that mineral, since you will normally be driven to consume it. But if you don’t have a drive for a mineral, then just like with vitamins, you’re at risk of developing deficiencies in that mineral, since you don’t have any natural motivation to seek it out. If there’s a mineral that people are always getting deficient in, that’s probably a sign that it doesn’t have a drive.

Iodine is a necessary mineral — if you don’t get enough, you develop terrible diseases of deficiency, especially goiter. This happens pretty frequently, or at least it did until people discovered the connection and started supplementing salt with iodine. Again this seems like possible evidence that there’s no iodine drive and no iodine governor. If there were, then all these Swiss people suffering from goiter would have been sitting around in their mountain cabins going “damn I would kill for some seafood right now” (seafood is high in iodine). On the other hand, maybe they were saying that, and history simply didn’t record it.

This seems like the kind of thing we should already have a clear answer for, but the literature on iodine is pretty unclear — there are a few studies, like this one that says that children aged 8-10 can’t tell the difference between traditionally prepared pickles made with iodized salt and traditionally prepared pickles made with non-iodized salt. Most of the existing research agrees, though there isn’t much of it.

But we’ve collected a bit of data on this already, and found that while most people indeed seem unable to distinguish between iodized and non-iodized salt, a few people can pick them out of a lineup at rates somewhat better than chance. It’s also possible that most people can’t distinguish between iodized and non-iodized salt because most people aren’t iodine deficient, so that drive is inactive.

Another slightly odd possibility is that maybe some people have iodine governors and other people don’t. Maybe this depends on where your ancestors lived, and whether they naturally got iodine in their diet (like if they were fisherpeople) or whether they had to actively seek it out to get enough (#hillpeople).

We are probably “missing” some other governors, especially governors for things that are not necessary to stay alive per se, but things that would be nice to have.

For example, there appears to be no emotion that drives us to go and get more sunshine. Lack of sunshine is pretty bad for you, but there’s just no system making sure you go out and get it. Just like vitamin C, our ancestors were exposed to so much sunlight that evolution never bothered to give us a sunlight drive.

This is why you have to use your human intellect, or a phone reminder or something, to remember to get your daily sunlight. You have a hard time building an association between sunlight and health because you don’t have a dedicated system keeping tabs on it.

2.3 Resistance

Another way to identify the drives is to ask ourselves what kinds of things make people angry when you try to stop them from doing those things.

This provides some justification for drives like privacy and territoriality. Most people will go nuts if they’re not allowed some amount of personal territory; think of the teenager with the STAY OUT sign on their door.

This is also the reason to believe in various social emotions, like an emotion that arises when we feel we are being taken advantage of. This governor has a target that’s something like “I am doing 1/x of the work in this group, where x is the number of people in this group”. If you are doing more than your fair share of the work, very far from this target, then you get an error signal that feels something like being exploited, or being played.

This is why roommate situations are so stressful. People have different setpoints for cleanliness, and you might expect that each person would just clean the apartment up to their preferred level. An animal that had no social emotions would probably do exactly that. But people are social animals, and for people living in groups, the desire for cleanliness is in conflict with the desire not to get taken advantage of.

We can also take the argument from depression in reverse. When someone is in the depths of a serious depression, we think that’s a result of all of their drives being turned way down. What do people conspicuously stop doing when they are depressed? The answer is hygiene. They let both their body and their living space become unkempt, even filthy.

If you try to stop someone from getting/doing something, and they resist, that’s a drive. This is useful when, like privacy, it may not appear that they’re actively doing anything. But a drive for privacy becomes apparent when you don’t let people have it, because then they will fight for it.

2.4 Knockout

Sometimes a drive is conspicuously absent in a few individuals, throwing into stark relief the fact that it’s present in everyone else. This can give us a surprisingly clear picture of the missing drive — the shape of something can be more obvious from its absence than its presence (or at least you can learn different things about its shape from the absence).

Cases of total or near-total knockout, where a person or animal is entirely missing a drive or an emotion, provide pretty strong evidence that the drive is present in everyone else. Consider the patient known as SM-046, a woman with severe amygdala damage, who experiences almost no fear:

While the researchers behind this study don’t seem to understand its significance, we see this as strong evidence that fear and suffocation are separate emotions arising from separate drives.

SM has a complete fear knockout, and never experiences fear, no matter how dangerous the situation. She just doesn’t have that governor, or her copy of the governor is totally turned off. But she will still feel “air hunger” when she is suffocating, because breathing is handled by a different governor. It produces an entirely different error signal, one that’s easy to mistake for fear if you’re not looking carefully.

Fear is pretty important to survival, so it seems like one of those cases where you might expect evolution to have added some redundancy. It seems reasonable to have different fear governors for different things, so if you knock your head wrong once and are no longer afraid of snakes, at least you’re still afraid of tigers. But SM doesn’t seem to have any backup fears that are still online.

This suggests two possibilities. 1) Maybe there is really only one governor that accounts for every kind of fear. SM isn’t just missing some kinds of fear, she’s missing every kind, because there’s a single point of failure. 2) There are multiple fear governors, but they are organized in a way where it’s possible to knock all of them out at once. For example, maybe there are multiple governors but her ability to generate the perception of danger is knocked out, so all the governors are totally inactive.

There are also some very rare genetic conditions that leave people with no experience of physical pain. These conditions are very rare because pain is very important. Without pain, you usually die, because you have no motivation not to put your arm in a wood chipper. One patient said, “at a young age, I would like to bang my head against the wall because I liked the feeling of vibration”.

This suggests that like fear, pain might be a single emotion, because it can be so cleanly toggled on or off. As far as we know, there aren’t genetic conditions where you can feel burning but you can’t feel cutting, or vice versa. People seem to either have pain basically working or have it basically gone, across the board.

That said, there do appear to be shades of pain insensitivity. For example, Jo Cameron has a version of pain insensitivity where she still experiences pain in the sense that she can avoid harming herself, but her subjective experience of pain isn’t at all unpleasant. She can tell that she’s been burned or cut, but she doesn’t mind. She described childbirth as “a tickle”, and said, “I could feel that my body was changing, but it didn’t hurt me.”

While Jo’s case is extreme, this kind of variation seems common. Some people experience pain but don’t mind, and other people don’t notice at all, and also there are shades between. So maybe there are tightly-linked drives or subcomponents that can eventually be distinguished with enough examination.

Extreme personality disorders may also be a kind of knockout. The average psychopath behaves a lot like a person with weights near zero on certain social governors, the governors that normally make people feel emotions like empathy and shame.

Compare the stories of patient SM and Jo Cameron to this podcast interview with the sociopath M.E. Thomas / “Jamie”. There’s a lot of interesting stuff in here, but we want to highlight this one section where Spencer, the interviewer, asks her about fear:

SPENCER: I know a handful of sociopaths, and one thing I’ve asked them about is fear. Some of them say that they don’t think they have fear, or at least not in the normal way that other people do. What’s your relationship with fear?

JAMIE: Yeah, I totally agree with that. … Sometimes that’s gotten me in trouble because I will not take adequate precautions. Sometimes I do things that can maybe seem like I’m a little accident-prone. For instance, when I go mountain biking, I probably crash like 20% of the time, which I’ve heard is high.

SPENCER: Yeah, you mention in your book how you cut yourself in the kitchen a lot with knives by accident. Can you talk about that?

JAMIE: Yes, I still have a plastic safety knife. It’s kind of like the type that you carve pumpkins with, or little children can carve pumpkins with. I almost always use that knife. Here and there, I think it actually is safer for me to just use a bigger metal knife, but then I have to be very, very conscientious. I’m the same way too with train tracks. There are some train tracks close to where I live, and I cross them basically every day, but I know that I’m bad at paying attention and being careful for my own self. So I really talk to myself when I’m doing it, I’m like, “Here we come, 15 feet from the train tracks, 10 feet from the train tracks. Look right, left, right, left, right.” It’s this very belt and suspenders approach to kind of rein in my brain, which naturally doesn’t care, doesn’t even pay attention to things like that.

Sometimes psychopaths like to say that they are more rational than other people, like in this excerpt:

JAMIE: I think you can always cooperate with psychopaths when your incentives align, and when you’re able to convince a psychopath that the incentives do align, then the psychopath is a very good team member.

SPENCER: And why are they a good team member?

JAMIE: Because once their incentives are aligned that way, they’re almost like a robot. They will always behave in a way that is in alignment with their incentives. Essentially, you can trust — in economics, they talk about the rational actor, who always behaves rationally — in a lot of ways, the psychopath, as long as they’re not experiencing gray rage or maybe some weird hormones or a situation like that, they basically are the economic rational actor.

But assuming self-preservation is one of your values, what is so rational about crashing 20% of the time you go mountain biking?

A different interpretation is that psychopaths aren’t more rational, but they are less conflicted. What they describe as a lack of ego is perhaps a lack of the self-suppressing social emotions that include certain types of fear of social consequences (for example, shame).

In a normal person, these prosocial emotions are in conflict with selfish desires that might lead someone to cheat, lie, steal, and so on. But psychopaths mostly lack these emotions, they are entirely un-self-conscious. This means that they feel little hesitation to bend the rules. But it also has the relaxing side effect of leading to less inner conflict, which might make one feel very rational. After all, the experience is of having clear desires and working towards them without any second thoughts.

This might also be why psychopaths are often so charming and charismatic — we find a lack of inner conflict very attractive, the lack of tension even showing in your face.

3. Empirical Methods

So far we’ve looked at observational techniques only. Now we’re gonna get off our asses and (describe how to) collect some data. Here’s how we might do it.

3.1 Artificial Knockout

Natural knockouts are the clearest-cut examples, and teach us the strongest lessons. But we can learn similar lessons by knocking out emotions artificially, like with drugs.

Drugs don’t usually seem to reduce the weight on a governor to zero. But they do often seem to turn the weight (or error) on a drive down, and sometimes they seem to turn it up. For example, alcohol seems like it temporarily reduces the weights on the fear and the shame governors, making people less driven by fear and shame. In contrast, it doesn’t seem to have much impact on the hunger governor. Drunk people seem just as hungry as normal. Or maybe alcohol turns hunger up; it seems like everyone wants fried food after a couple of pints, but maybe this is driven more by the sudden lack of shame.

Sometimes the changes caused by drugs are what we would normally think of as “side effects”, but all effects are really just effects. When we talk about SSRIs having sexual side effects, this may cash out as them interfering in some way with the horny governor.

There are some extreme circumstances that are almost like knockouts, and may help us distinguish between emotions in similar ways. Our favorite example, of course, is the potato diet. When people eat almost nothing but potatoes for several days, some of them find that the normal sensation of hunger becomes very weird. They say things like:

“It’s been very easy for me to not eat enough doing this and not realize that’s why I feel off. Might be worth a PSA. Hunger literally feels different on this diet.”
“finding myself completely forgetting about food, even as something i need to do to live. not experiencing any hunger. no urge to snack. i am certain i’m not drinking enough water. i definitely have more energy, and more focus, despite this … not sure if i’m actually hungry but haven’t eaten nearly enough.”
“I did get more tired throughout, and my appetite actually continually decreased. Had to remind myself to eat quite often and actually made a schedule.”
“On 100% potatoes, I don’t ever feel ‘hungry’ the way hunger usually feels, I’ll notice that I’m low-energy or fading, and that’s my signal that I should eat again”
“the normal feeling of hunger was entirely gone for me – what was left was a feeling of being almost faint and feeling not great when I went too long without eating. Took a lot of adjusting to.”

We think that “hunger” is actually a number of different emotions that come from several different drives. Because eating a well-rounded meal satisfies most of these drives at the same time, we don’t normally experience these emotions independently from one another, which is why we call them by a single name.

We interpret the comments from the potato diet as reflecting a situation where some hunger emotions are unbundled from others, creating unusual subjective experiences.

We think it went something like this: let’s say there’s one hunger drive for calories and then a bunch of drives for micronutrients like magnesium, sugar, or whatever. Normally the metabolism governor drives most eating behavior, since that’s the strongest signal. The other signals rise and fall with the signal from the calories governor anyway, because if you’re getting enough calories from a mixed diet, you will be getting approximately the right amount of the other things you need. They only chime in if you happen to be getting a diet really low in magnesium or whatever.

But something about the potato diet convinces your body that its weight set point should be lower, so it starts removing calories from your fat stores instead of adding them. This makes the metabolism governor stay quiet. It doesn’t have to vote for you to eat to get calories anymore, they are being added directly to the bloodstream.

But your micronutrient governors don’t have the same kinds of reserves, so they keep sending out their error signals as normal. But you’re not used to responding to these micronutrient errors in isolation, and they’re not used to running the show. You feel vaguely weird and bad, but it’s not something you’re used to thinking of as hunger, and you don’t immediately know what to do about it. That’s why it feels weird on the potato diet.

Or here’s a slightly different model: If there are hunger governors for five different things and your diet only provides the nutrients that satisfy four of them, you’ll seem to experience hunger normally: very hungry before meals, full after meals (because of a fullness governor switching on). But there’s one governor that continues to vote for eating, who is later joined by the other four as time passes. So if switching to the potato diet suddenly satisfies all the hunger governors, you might experience the complete satisfaction of your hunger governors for the first time.

Which drives and emotions have been unbundled, and why exactly that would happen on potatoes, remains an open question.

3.2 Behavioral Exhaustion

You can discover the root of a drive by separating the target of that drive into its component parts, and feeding each one into the system in turn.

Let’s say you’re craving a cranberry juice cocktail. A natural question might be to wonder why you crave it so bad. Any craving presumably comes from one or more of your drives, but which one(s)?

A reasonable guess is that you don’t crave the whole cranberry juice cocktail, you actually crave one or more of its ingredients. You can test this by consuming the ingredients one at a time. If you first let yourself drink as much water as you want, and you still crave the cranberry juice cocktail, clearly you did not want it just because you were thirsty per se.

So you look at the other ingredients. There’s lots of sugar in the cocktail, maybe you are craving something sweet. So next you eat as much sugar as you want. If you’re still craving the cranberry juice cocktail, then it must have been something else.

In principle you can follow this process as far as you want, to discover precisely the ingredient you were craving. And once you discover the ingredient, you can follow the same process even further. You can go as far as centrifuging the original cranberry juice and eating different strata to determine exactly what part of it you were after. With enough effort, you might be able to identify the exact molecule.

In practice, things probably won’t be so simple. From oral rehydration formula, we know that some combinations of sugar, salt, and water are much more hydrating than others. If you mix the wrong combination, it can even become dehydrating. So in some cases, cravings may be holistic, your drives may really vote for something that is greater than the sum of its parts. This may be why some foods, like beans and rice, are often eaten together and seem much more delicious than the sum of their parts. In our pursuit of a better understanding of psychology, we can’t forget about biology. There is probably a reason why people prefer to drink lemonade instead of consuming water, sugar, and lemon juice in isolation. And by golly, we’re gonna find it.

In general, exhaustion shows that 1) there is a drive for the pure thing being exhausted (or else why would the organism keep taking/doing it), and 2) any behavior remaining after exhaustion cannot be caused in this case by the exhausted drive, though the exhausted drive might also vote for that behavior if it were not exhausted.

3.3 Fungibility

Another angle is looking at impulses for different actions and trying to determine how they are fungible.

The thermostat only cares about the temperature in the house. When the house is too cold, actions that raise the temperature in any way are all equally successful, since they all correct the thermostat’s error. So from the thermostat’s perspective, actions that raise the house temperature are totally fungible. It is just as happy to turn on the baseboard heating as it is to turn on the forced-air heating, in this absurd hypothetical where your house, for some reason, has both.

We can use other fungible actions in the same way, and trace them back to their common origin. For example, you may notice that you feel hungry. You want bananas. You interrogate that feeling — what else sounds good? The other things that come to mind are avocados, potatoes, and spinach. All of them sound great.

In many ways these foods are very different — for example, the avocado is high in fat and the banana is not. But you realize that all of the foods that sound good have something in common: they are all high in potassium. So instead of eating any of these foods, you drink some straight potassium chloride in water.

You may find that you no longer feel hungry at all, suggesting that what you thought of as a general sense of hunger was in fact a single drive for potassium. Your potassium governor was happy to fulfill in a number of different ways, so it was willing to vote for bananas, avocado, spinach, anything that would reduce its error. And when you drank straight potassium chloride in water, that also satisfied the drive, so the error signal went away. We don’t know if this would happen, but if it did, that would be fairly strong evidence for a potassium drive.

Similarly, you might notice you have a craving for eggs and broccoli. Then you eat some nutritional yeast, which is basically nothing but B vitamins. Five minutes later, you don’t crave those foods anymore. Same deal.

3.4 Prevention

A version of fungibility in reverse, or an empirical version of resistance. To see what drive is behind a behavior, keep the person (or animal) from doing the behavior and see what they do instead. If an organism tries to do something, stop it. What does it do instead? This is probably an expression of the same drive.

If you do this enough, you can triangulate all these behaviors and infer what variable the drive is controlling. You might also learn that two behaviors you thought were different are both expressions of the same drive.

Also interesting that at some point the organism might do a substitution, e.g. look at pictures of food if it can’t manage to eat. When they can’t substitute, you have the drive surrounded.

3.5 Effort

A similar method is to see what goals an animal will expend large amounts of effort to reach. A rat will push a lever 1000 times for water if that’s its only way to get hydration. It wouldn’t do this if the desire for water were an epiphenomenon of some other drive. The rat really wants water, specifically wants water, and will accept no substitutes. The fact that it puts in so much effort is the evidence.

3.6 Division

To alchemists, Fire was considered “the true and Universal Analyzer of all Mixt Bodies”, capable of dividing any substance into its more base components.

But there were some problems with this approach. The alchemists were shaken when they discovered that Fire was not the only thing that could divide a substance into simpler components. They found that liquids like urine, beer, and wine would separate when put out in extreme cold.

Worse, there were some elements that fire couldn’t separate at all. Robert Boyle relates a story of gold being kept in a furnace for two months straight. The gold stayed a liquid the whole time, but it never separated into baser substances. Apparently fire had failed to separate gold into its elementary ingredients. Some true and Universal Analyzer! Or, more radically, maybe this meant that gold didn’t have more basic components, that gold itself was an element.

Observations like these threw alchemy into a state of chaos. In the preface to his Elements of Chemistry, where he pitches his homies on a new way of doing things, Antoine Lavoisier explains this history. He apologizes for not including a list of all the elements, saying (emphasis ours):

It will, no doubt, be a matter of surprise, that in a treatise upon the elements of chemistry, there should be no chapter on the constituent and elementary parts of matter; but I shall take occasion, in this place, to remark, that the fondness for reducing all the bodies in nature to three or four elements, proceeds from a prejudice which has descended to us from the Greek Philosophers. The notion of four elements, which, by the variety of their proportions, compose all the known substances in nature, is a mere hypothesis, assumed long before the first principles of experimental philosophy or of chemistry had any existence. In those days, without possessing facts, they framed systems; while we, who have collected facts, seem determined to reject them, when they do not agree with our prejudices. The authority of these fathers of human philosophy still carry great weight, and there is reason to fear that it will even bear hard upon generations yet to come.

It is very remarkable, that, notwithstanding of the number of philosophical chemists who have supported the doctrine of the four elements, there is not one who has not been led by the evidence of facts to admit a greater number of elements into their theory. The first chemists that wrote after the revival of letters, considered sulphur and salt as elementary substances entering into the composition of a great number of substances; hence, instead of four, they admitted the existence of six elements. Beccher assumes the existence of three kinds of earth, from the combination of which, in different proportions, he supposed all the varieties of metallic substances to be produced. Stahl gave a new modification to this system; and succeeding chemists have taken the liberty to make or to imagine changes and additions of a similar nature. All these chemists were carried along by the influence of the genius of the age in which they lived, which contented itself with assertions without proofs; or, at least, often admitted as proofs the slighted degrees of probability, unsupported by that strictly rigorous analysis required by modern philosophy.

Lavosier doesn’t claim that he knows what is an element and what is not. He says that we are going to need some very serious analysis before any of us can be sure. So instead of starting with a list of the elements, Lavoisier proposes a new method for figuring them out:

If, by the term elements, we mean to express those simple and indivisible atoms of which matter is composed, it is extremely probable we know nothing at all about them; but, if we apply the term elements, or principles of bodies, to express our idea of the last point which analysis is capable of reaching, we must admit, as elements, all the substances into which we are capable, by any means, to reduce bodies by decomposition. Not that we are entitled to affirm, that these substances we consider as simple may not be compounded of two, or even of a greater number of principles; but, since these principles cannot be separated, or rather since we have not hitherto discovered the means of separating them, they act with regard to us as simple substances, and we ought never to suppose them compounded until experiment and observation has proved them to be so.

To put this in more modern language: What we mean by “element” is “something that can’t be divided”. If we’ve discovered a way to divide some substance into different components, that substance can’t be an element. Elements are by definition the basic building blocks of matter that cannot be divided — so if it can be divided in any way, it’s not an element. (Ignore atomic chemistry for the moment, they wouldn’t discover that for a hundred years.)

Substances that we can’t divide are candidates. They might be elements — after all, they seem entirely indivisible so far. But some day we might discover a way to divide them into different components, which would prove they’re not elements after all. So they’re not elements for sure, only candidates.

If you put wood into a fire, it will be divided into ashes, smoke, etc. This makes it pretty clear that wood isn’t an element. But as of 1789, no one has found a way to divide gold into anything else, and it’s not for lack of trying. So gold should be considered an element, at least for now. To Lavoisier, gold is provisionally an element. Other things can be divided in a way that yields gold, but he’s never been able to confirm a way to divide gold into anything simpler.

In short, it’s impossible to prove that something is an element, but you can prove that something is not, simply by dividing it. Anything we know how to divide is proven to be a compound, not an element. But anything we don’t know how to divide is only a possible element, because we may yet discover some way to divide it.

We find ourselves in a similar situation today, and we can use something like Lavoisier’s approach to discover the full set of psychological drives (each with a corresponding emotion and governor), just like the chemists used his approach to discover the full set of elements.

The difference between these methods and the methods from the previous sections is that the methods in the previous sections start with observed behaviors, and try to figure out what drive(s) are behind them. These methods start with established or proposed drive(s) and try to learn more.

A good place to start is hunger. We think that hunger is not one emotion, it’s a common term applied to many emotions. The reason these signals are all mistakenly called by the same name, at least in English, is that they all come from governors that vote for eating behavior. These behaviors all look superficially similar, but in fact we put things in our mouths for a variety of reasons.

Humans come with several different hunger drives because we need to eat several different things to remain healthy. We’ll call these things-you-need-to-eat “nutrients”, though this may be a little different from the common usage of that word.

Most foods contain more than one nutrient, so most foods satisfy more than one governor. A decent burrito will satisfy almost everything — your salt, carbs, fat, and guacamole governors, etc. This makes these emotions hard to disentangle, so most cultures don’t bother. It’s still possible to express these drives — “I’m really craving pickles” or “I would kill for some mozzarella right now” — and there are some related idioms like “sweet tooth”. But we don’t have dedicated words for each individual emotion, we just lump them together as “hunger”.

If you’ve messed around with your diet in really strange ways, as we have, you can sometimes get to the point where the different hunger drives become obvious. When we supplemented potassium, it was very clear to us that this increased our cravings for salt.

Like Lavoisier, we can try to break hunger down into individual drives, until we find drives we can’t distinguish any further. Those drives that can’t be divided are probably basic drives, at least until proven otherwise.

Let’s play through some examples. We think that there is probably at least one drive for salt (likely for sodium, but maybe there is a drive for chloride too) and at least one drive for fatty foods.

Now consider Joey, who wants to eat a pile of onion rings. If this is simply unalloyed hunger, a general desire for calories, then if you give Joey any other food, and he eats that food to exhaustion, he should no longer want to eat the onion rings.

However, if we assume there is one drive for salty foods, and a separate drive for fatty foods, we might suspect that the strong desire for onion rings reflects a combination of these desires, leading him to seek a food that is both salty and fatty. If true, he will also be at least somewhat interested in foods that are salty but not fatty, and in foods that are fatty but not salty.

Then, if we let Joey eat as much as he wants of a food that is salty but not fatty (perhaps mini pretzels), he will still be interested in foods that are fatty but not salty. And if we let him eat as much as he wants of a food that is fatty but not salty (perhaps avocado), he will still be interested in foods that are salty but not fatty. This would demonstrate that these are different drives.

It probably has not escaped your attention that most foods that are salty are also fatty, and vice versa (french fries, olives, peanut butter, etc.). Perhaps this indicates some kind of drive specifically for foods that are both fatty and salty, a drive that cannot be extinguished by salt or fat in isolation. We will probably discover some outcomes at least this weird, and we should try not to stick too closely to any assumptions. The early chemists really didn’t expect to some day discover isotopes.

Evolution is doing her own thing, and she has no obligation to provide categories that make any sense to us. Governors might be controlling anything at all. There might be an important hunger governor that controls a proxy of a proxy of the ratio between sodium and potassium in the bloodstream. That’s not something that a human will find intuitive — but it’s not about being intuitive to the humans! The only law is, whatever works!

But assuming for a moment that our study with Joey did support the idea that there’s both a salt governor and a fat governor, similar techniques could be used to discover whether there’s just one governor controlling fat-hunger, or if there are separate drives for different kinds of fat. Perhaps one drive for saturated and another drive for unsaturated fat. Or perhaps one drive for sterols? The truth will probably be stranger than we expect.

A relatable example of this is the “dessert stomach”. If you can eat a big meal and still have room for dessert, it must be because your sugar or fat governor (or both) is still active. You can exhaust chicken-hunger while not exhausting chocolate-lava-cake hunger. This is clear evidence that there are at least two hunger drives.

3.7 The Parable of Rat C13

A lot of the studies we’ve suggested would be difficult or unethical to run on humans. But it may be easier to run this kind of study with animals.

First of all, we can have more control over an animal’s diet than we usually would over a human’s. And second, humans might try to eat more or less of something to show the researchers how virtuous or how tough they are, but animals won’t have anything to prove — they’ll express their hunger drives with little interference from drives about impressing the research team.

A design might look something like this: Restrict the animal’s food for a while so we know it will be hungry. Then, give it as much butter as it wants and let it eat until it stops eating. This way, we can assume that it should be fully corrected for any nutrient in the butter.

Then, give the animal access to olive oil. If it eats an appreciable amount of olive oil, that suggests there’s a drive for at least one nutrient in olive oil that is not in butter. Further tests should be able to isolate the exact nutrients. You could also try this in the opposite order, to find if there are drives for nutrients in butter that are not found in olive oil.

And in fact, some of these studies have already been run on animals. As one example, consider one 1968 paper by Paul Rozin. In this study, Rozin housed Sprague-Dawley rats in cages that contained water, a salt-vitamin mix, and a “liquid cafeteria” of three foods: 1) sucrose in water, 2) a 30% protein solution, and 3) Mazola oil for fat. All the rats responded well to this cafeteria, growing bigger and showing a lot of stability in their choices of liquids.

Rats clearly had protein targets and were able to hit them without blinking. When offered protein solution diluted by ½ or ¼, they increased how much solution they drank to compensate, so that their protein intake was approximately constant, though they didn’t compensate quite as well for the ¼ solution as they did for the ½ solution. Some rats were better than others at keeping their protein intake constant. This starts looking like an early form of cybernetic personality testing — at least in rats.

Even when Rozin added quinine hydrochloride to the diluted solution, a flavor that rats normally hate, they still compensated and drank more of the diluted protein solution. This suggests they really were controlling protein intake, not just drinking for taste. That said, Rat C13 seemed to like the quinine just fine, and didn’t show any preference for the solution without it. Another sign of personality — that Rat C13, what a character!

In contrast, when Rozin diluted the sucrose solution, their source of carbohydrates, the rats only drank a little more sucrose solution to compensate. Some rats didn’t drink more sucrose solution at all. This is kind of surprising, because under normal circumstances all the rats took at least 50% of their calories from sucrose.

Similarly, when rats were deprived of protein for a few days, they would drink more protein solution to make up for it. But when rats were deprived of sucrose solution for a few days, they would actually drink slightly less sucrose solution when it came back. The effects of being deprived were also noticeably different. Rats lost more than twice as much weight when deprived of protein than when deprived of carbohydrates.

We wish that we could provide similar comparisons for fat, but Rozin says that, “due to the very low levels of fat intake, no meaningful compensation value could be calculated.”

This isn’t evidence that carbohydrates are totally unregulated — they may just be regulated on a timescale that isn’t noticeable over a few days. The author speculates that, “this failure may have occurred because the highly palatable 35% sucrose solution is consumed at levels well above a physiological minimum.” And of course, the regulation may just be too complex to see in Rozin’s data. But it does at least look like evidence that protein is closely controlled, and controlled separately from overall calorie intake, at least in rats.

Score one for Lavoisier’s method. Assuming that these findings are reliable, this seems like clear evidence against the idea that there is just one elemental drive for hunger. It also seems like evidence in favor of a drive for protein. Whether that drive for protein is elemental, or whether it too can be broken down into a collection of more basic drives, perhaps drives for individual amino acids, remains to be seen.

Similar studies have suggested that cows have something like 16 different hunger drives:

The idea of feeding minerals “free choice” to livestock came about by a need to decrease over-consumption of a liquid supplement containing phosphoric acid, protein, molasses, and other minerals. Upon investigation, it was found that the liquid supplement was being used heavily by the animal as a source of phosphorous. Consequently, we discovered if animals had access to a phosphorous source on a free choice basis, over-consumption of the liquid ceased. We then extended this concept to other vitamins and minerals: if the animal was able to select phosphorous on a free choice basis, perhaps calcium could be selected in the same manner – success!

…In time, potassium, sulfur, silicon, magnesium, vitamins, and trace minerals were added to the list. Finally, there were 16 separate vitamins and minerals fed free choice.

These findings should be independently and widely replicated before we treat them as strong evidence, but if true, this suggests that cows have drives for each of these vitamins and minerals. If they didn’t have a drive for sulfur, why would they spend their time eating it?

4. In Which We Speculate About What Emotions There Are

The first major achievement for psychology may be a complete list of all the drives, governors, and emotions — each drive comes from a governor, and the emotion is that governor’s error signal. The most obvious analogy is to chemistry. This will be our version of the periodic table.

We’re still a long way off from this list being completed, but we can make some educated guesses about what will be on there once it’s finished. You just heard a lot of those guesses in the previous sections — now, we’ll put those guesses together into a rough draft.

*A slightly unorthodox, yet promising list of the emotions*

For now, we’ll try to call each governor by the name of its error signal — drives to eat come from a governor whose error is hunger, so these are hunger governors. The drive to keep yourself from physical harm comes from a governor whose error is pain, so this is the pain governor.

That said, there are a few cases where it’s easier to call a governor by some other name. It’s nice when we have existing terms like “thirst” already on hand, but there are some emotions that don’t have a common name, at least not in English. So sometimes we will punt and call these drives only “a drive to do X”, where X is the characteristic behavior that makes us suspect there’s a drive there in the first place.

The big question at this point is whether this can be more than just a list. The chemical elements have a periodic structure, their properties repeat in a regular pattern. This repetition, or periodicity, is visually organized in the periodic table, where elements are grouped into rows and columns to highlight these patterns. That’s the whole reason to have the periodic table in the first place — it’s more than just a list, and it eventually led to a better understanding of how the properties of elements are related to their atomic structure.

Maybe there is no structure or pattern to the drives, and we will just end up with a long list. But if there’s any kind of pattern or structure, we’d love to come up with an organization that highlights that structure, instead of just listing the drives one by one.

To make an early attempt, for now we will group the drives in three categories: physiological emotions, that attend to the basics needed to keep the body functioning; environmental emotions, that attend to the qualities of a person’s immediate external environment; and social emotions, that attend to a person’s social status and relations.

Physiological
- Suffocation/Panic
- Pain
- Hot
- Cold
- Exhaustion
- Waking
- Thirst
- Hunger (actually several drives)
- Satiety (stops us from eating; also probably several drives)
- A drive to fidget and be active that burns excess calories
- “Zoomies” (this may be the same as the drive to fidget; consider also that rodents need wheels in their cages, and if you give them a wheel in the wild, they’ll run on that too!)
- Horny
- A drive to pee (The Sims called it “Bladder”)
- A drive to shit
Environmental
- Fear
- Disgust
- A drive to have a clean and organized living space (The Sims called it “Room”)
- A drive to have a clean and well-groomed body
- Possibly decorative drives (though these may be extension of cleanliness drives)
- Possibly a drive to dig
- Possibly a drive to look at animals
- Possibly a drive to collect or hoard
- Possibly a drive to sort
Social
- A drive to regulate social status up
- A drive to keep social status from growing too fast
- A drive for physical contact; “touch starved”
- A drive for privacy, perhaps territorial
- A drive for autonomy
- A drive to socially dominate
- Possibly a desire to follow or submit
- Self-consciousness (an error when you are not acting consistently or normatively)
- Empathy
- Grief (the drive is to care for others, but the error signal is grief)
- Loneliness
- Anger
- Shame

The list should also include other signals that are not cybernetic control errors. Here’s our current best guess for that list:

Happiness
Surprise
Curiosity

Happiness and surprise are two things we subjectively experience all the time, but they don’t seem to be cybernetic control errors. They also don’t seem to drive behavior.

In contrast, curiosity doesn’t seem to be a cybernetic control error, because it doesn’t seem to drive a target to zero, but curiosity does seem to drive behavior. As we speculated above, we think curiosity may be an adversarial signal that teaches us about the world by voting for us to explore options that our governors wouldn’t vote for on their own.

If the history of psychology is any indication, people will want to jump straight to figuring out the social emotions. We think this is a mistake. The social emotions will probably be the hardest to uncover.

There are two reasons to leave the social emotions for later.

First, we don’t know what the social emotions might be controlling. If there really is a dominance emotion, what is it targeting? It can’t literally be “the image of someone wailing at your feet.” It’s going to be something more subtle, and we don’t currently know how to capture or measure that thing.

Second, investigating the social emotions is impractical. If you want to be able to alter someone’s social status at will for an experiment, you kinda have to put people in a Biodome or a VR world. Even then, it’s hard to be sure you’re really evoking what goes on in the regular world.

We have much stronger suspicions about what the physiological drives control, and investigating them doesn’t require us to build a whole alternative society. You can just make people eat salt or not eat salt and see what happens.

And because other animals probably share a lot of our physiological emotions, we can run studies on them that would be unethical or impractical to run on humans. You can’t study the social emotions in other animals because other animals probably don’t have most of the social emotions that humans do. Maybe dolphins or elephants, but they’re hard to study.

We should start with something easier. We should start by studying emotions like hunger and fatigue, then use what we’ve learned to eventually understand the social emotions.

In chemistry, we discovered the gases first, then later got around to the other elements. In psychology, we will probably learn about the physiological drives first. We may cut our teeth on hunger before working up to things like fatigue, pain, fear, and eventually the social emotions, which are probably the most baroque and complex.

It’s true that the social drives are the most interesting, and it might seem like understanding the social drives might be more important, might solve more of the problems you care about. But be patient. You have to spend some time rolling balls down ramps before you can go to the moon.

[Next: MORE METHODS]

The Mind in the Wheel – Part IX: Animal Welfare

April 17, 2025April 24, 2025 slimemoldtimemoldanimal welfare, cybernetics, paradigm, psychology, science, The Mind in the Wheel4 Comments

When people talk about the ethical treatment of animals, they tend to hash it out in terms of consciousness.

But figuring out whether animals have consciousness, and figuring out what consciousness even is, are philosophical problems so hard they may be impossible to solve.

There’s not much common ground. The main thing people are generally willing to agree on is that since they themselves are conscious, other humans are probably conscious too, since other humans behave more or less like they do and are built in more or less the same way.

So a better question might be whether or not animals feel specific emotions, especially fear and pain.

The cybernetic paradigm gives a pretty clear answer to this question: Anything that controls threat and danger has an error signal that is equivalent to fear. And anything that controls injury has an error signal that is equivalent to pain.

This allows us to say with some confidence that animals like cows and rats feel fear, pain, and many other sophisticated emotions.

There’s no reason to suspect that a cow or a rat’s subjective experience of fear is meaningfully different from a human’s. We can’t prove this, but we can appeal to the same intuition that tells you that since you are conscious, other humans are probably conscious as well.

You believe that other humans feel fear, and that their fear is as subjectively terrifying to them as your fear is to you, for a simple reason: you notice that another person’s external behavior is much the same as yours is when you feel afraid, and is happening under similar circumstances. Then, you make the reasonable assumption that since all humans are biologically similar to one another, their external behavior is likely caused by similar internal rules and structures. Since there’s no reason to suspect that basically the same behavior created by basically the same structures would be any different phenomenologically, you conclude that other humans probably have the same kind of subjective experience.

With a better model for the emotions, this same logic can extend to other animals. Assuming we are right that a cow also has a governor dedicated to keeping it safe, which generates an error signal of increasing strength as danger increases, which drives behavior much like the behavior we engage in when we are afraid, there is little reason to suspect that the cow’s subjective experience is meaningfully different from our own. At the very least, if you accept the conclusion for humans, it’s not clear why you would reject it for other animals.

This is a relatively easy conclusion to draw for other complex, social mammals. They almost certainly feel fear and pain, because we see the outward signs, and because the inside machinery is overall so similar. But it’s harder to tell as animals become less and less closely related to humans.

An animal that doesn’t bother to avoid danger or injury clearly isn’t controlling for them. But most animals do. So the question is whether these animals actually represent danger and injury within a control system, trying to minimize some error, or if they simply avoid danger and injury through stimulus-response.

Dogs probably feel fear, and even without dissecting their brains, we can reasonably assume that they use similar mechanisms as we do. They’re built on the same basic mammalian plan and inherit the same hardware. But what about squid, or clams? These animals probably avoid danger in some way, but it’s not clear that they use an approach at all like the one we do.

If an animal cybernetically controls for danger and injury, then they are producing an error signal. In this case, the argument from above applies — there’s no reason to suspect that a creature using the same algorithms to accomplish the same thing is having a notably different experience. Their error signal is probably perceived as an emotion similar to our emotions.

But if an animal’s reaction to danger is instead a programmed response to a set stimulus, then there is no control system, no feedback loop, and no error signal.

For example, we might encounter an arthropod that freezes when we walk nearby. At first this looks like a fear response. We imagine that the arthropod is terrified and trying to avoid being seen and eaten.

But through trial and error, we show that whenever a shadow passes over it, the arthropod always freezes for exactly 2.5 seconds. Let’s further say that the arthropod shows no other signs of danger avoidance. If you “threaten” it in other ways, put it in other apparently dangerous situations, it changes its behavior not at all. The only thing it responds to is a shadow suddenly passing overhead.

This suggests that, at least for the purposes of handling danger, this arthropod operates purely on stimulus-response. As a result, it probably does not feel anything like the human emotion of fear. Even if we allow that the arthropod is conscious in some sense, its conscious experience is probably very different from ours because it is based on a different kind of mechanism.

Here’s a similar example from Russel & Norvig’s Artificial Intelligence: A Modern Approach. We can’t confirm that what they describe is actually true of dung beetles — it may be apocryphal — but it’s a good illustration of the idea:

Consider the lowly dung beetle. After digging its nest and laying its eggs, it fetches a ball of dung from a nearby heap to plug the entrance. If the ball of dung is removed from its grasp en route, the beetle continues its task and pantomimes plugging the nest with the nonexistent dung ball, never noticing that it is missing. Evolution has built an assumption into the beetle’s behavior, and when it is violated, unsuccessful behavior results.

It’s hard to figure out whether an organism is controlling some variable, or whether it is running some kind of brute stimulus-response, especially if the stimulus-response routine is at all complicated. We may need to develop new experimental techniques to do this.

But every organism has to maintain homeostasis of some kind, and almost all multicellular organisms have a nervous system, which suggests they’re running some kind of feedback loop, which means some kind of error signal, which means some kind of emotion.

For now, we think this is a relatively strong argument that most other mammals experience fear and pain the same way that we do — at least as strong of an argument that other humans experience fear and pain the same way that you experience them.

Figuring out whether you are in danger requires much more of a brain than figuring out whether you have been cut or injured. So while most animals probably feel pain, some animals may not feel fear, especially those with simple nervous systems, those with very little ability to perceive their environment, and those who are immobile. There’s no value in being able to perceive danger if you can’t do anything about it.

[Next: DYNAMIC METHODS]