Every Bug is Shallow if One of Your Readers is an Entomologist

The Cathedral and the Bazaar is an essay/book about how Linus Torvalds threw all the normal rules of software out the window when he wrote the operating system Linux

Back in the day, people “knew” that the way to write good software was to assemble an elite team of expert coders and plan things out carefully from the very beginning. But instead of doing that, Linus just started working, put his code out on the internet, and took part-time help from whoever decided to drop by. Everyone was very surprised when this approach ended up putting out a solid operating system. The success has pretty much continued without stopping — Android is based on Linux, and over 90% of servers today run a Linux OS.

Before Linux, most people thought software had to be meticulously designed and implemented by a team of specialists, who could make sure all the parts came together properly, like a cathedral. But Linus showed that software could be created by inviting everyone to show up at roughly the same time and place and just letting them do their own thing, like an open-air market, a bazaar.

Let’s consider in particular Chapter 4, Release Early, Release Often. One really weird thing Linus did was he kept putting out new versions of the software all the time, sometimes more than once a day. New versions would go out with the paint still wet, no matter how much of a mess they were.

People found this confusing. They thought putting out early versions was bad policy, “because early versions are almost by definition buggy versions and you don’t want to wear out the patience of your users.” Why the hell would you put out software if it were still crawling with bugs? Well,

Linus was behaving as though he believed something like this:

> Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.

Or, less formally, “Given enough eyeballs, all bugs are shallow.” I dub this: “Linus’s Law”.

This bottom-up method benefits from two key advantages: the Delphi Effect and self-selection.

More users find more bugs because adding more users adds more different ways of stressing the program. This effect is amplified when the users are co-developers. Each one approaches the task of bug characterization with a slightly different perceptual set and analytical toolkit, a different angle on the problem. The “Delphi effect” seems to work precisely because of this variation. In the specific context of debugging, the variation also tends to reduce duplication of effort.

So adding more beta-testers may not reduce the complexity of the current “deepest” bug from the developer’s point of view, but it increases the probability that someone’s toolkit will be matched to the problem in such a way that the bug is shallow to that person.

One special feature of the Linux situation that clearly helps along the Delphi effect is the fact that the contributors for any given project are self-selected. An early respondent pointed out that contributions are received not from a random sample, but from people who are interested enough to use the software, learn about how it works, attempt to find solutions to problems they encounter, and actually produce an apparently reasonable fix. Anyone who passes all these filters is highly likely to have something useful to contribute.

Linus’s Law can be rephrased as “Debugging is parallelizable”. Although debugging requires debuggers to communicate with some coordinating developer, it doesn’t require significant coordination between debuggers. Thus it doesn’t fall prey to the same quadratic complexity and management costs that make adding developers problematic.

In practice, the theoretical loss of efficiency due to duplication of work by debuggers almost never seems to be an issue in the Linux world. One effect of a “release early and often” policy is to minimize such duplication by propagating fed-back fixes quickly.

Research is difficult because reality is complex and many things are confusing or mysterious. But with enough eyeballs, all research bugs are shallow too.

Without a huge research budget and dozens of managers, you won’t be able to coordinate a ton of researchers. But the good news is, you didn’t really want to coordinate everyone anyways. You can just open the gates and let people get to work. It works fine for software!

The best way to have troubleshooting happen is to let it happen in parallel. And the only way to make that possible is for everyone to release early and release often. If you sit on your work, you’re only robbing yourself of the debugging you could be getting for free from every interested rando in the world. 

In the course of our obesity research, we’ve talked to water treatment engineers, social psychologists, software engineers, emeritus diabetes researchers, oncologists, biologists, someone who used to run a major primate lab, multiple economists, entrepreneurs, crypto enthusiasts, physicians from California, Germany, Austria, and Australia, an MD/PhD student, a retired anthropologist, a mouse neuroscientist, and a partridge in a pear tree a guy from Scotland

Some of them contributed a little; some of them contributed a lot! Every one had a slightly different toolkit, a different angle on the problem. Bugs that were invisible to us were immediate and obvious to them, and each of them pointed out different things about the problem.

For example, in our post recruiting for the potato diet community trial, we originally said that we weren’t sure how Andrew Taylor went a year without supplementing vitamin A, and speculated that maybe there was enough in the hot sauces he was using. But u/alraban on reddit noticed that Andrew included sweet potatoes in his diet, which are high in vitamin A. We totally missed this, and hadn’t realized that sweet potatoes are high in vitamin A. But now we recommend that people either eat some sweet potato or supplement vitamin A. We wouldn’t have caught this one without alraban.

In another discussion on reddit, u/evocomp challenged us to consider the Pima, a small ethnic group in the American southwest that were about 50% obese well before 1980, totally bucking the global trend. “What’s the chance that [this] population … [is] highly sensitive and equally exposed to Lithium, PFAS, or whatever contaminants are in SPAM or white bread?” evocomp asked. This led us to discover that the Pima in fact had been exposed to abnormal levels of lithium very early on, about 50x the median American exposure in the early 1970s. Before this, lithium had been just one hypothesis among many, but evocamp’s challenge and the resulting discoveries promoted it to the point where we now think it is the best explanation for the obesity epidemic. Good thing the community is helping us debug!

My original formulation was that every problem “will be transparent to somebody”. Linus demurred that the person who understands and fixes the problem is not necessarily or even usually the person who first characterizes it. “Somebody finds the problem,” he says, “and somebody else understands it. And I’ll go on record as saying that finding it is the bigger challenge.”

This is a classic in the history of science. One person notices something weird; then, 100 years later, someone else figures out what is going on. 

Brownian motion was first described by the botanist Robert Brown in 1827. He was looking at a bit of pollen in water and was startled to see it jumping all over the place, but he couldn’t figure out why it would do that. This bug sat unsolved for almost eighty years, until Einstein came up with a statistical explanation in 1905, in one of his four Annus Mirabilis papers. Bits of pollen jumping around in a glass of water doesn’t sound very interesting or mysterious, but this was a big deal because Einstein showed that Brownian motion is consistent with what would happen if the pollen was being bombarded from all sides by tiny water molecules. This was strong evidence for the idea that all matter is made up of tiny indivisible particles, which was not yet well-established in 1905!

Or consider DNA. DNA was first isolated from pus and salmon sperm by the Swiss biologist Friedrich Miescher in 1869, but it took until the 1950s before people figured out DNA’s structure. 

Complex multi-symptom errors also tend to have multiple trace paths from surface symptoms back to the actual bug. … each developer and tester samples a semi-random set of the program’s state space when looking for the etiology of a symptom. The more subtle and complex the bug, the less likely that skill will be able to guarantee the relevance of that sample.

For simple and easily reproducible bugs, then, the accent will be on the “semi” rather than the “random”; debugging skill and intimacy with the code and its architecture will matter a lot. But for complex bugs, the accent will be on the “random”. Under these circumstances many people running traces will be much more effective than a few people running traces sequentially—even if the few have a much higher average skill level.

This is making an important point: if you want to catch a lot of bugs, a bunch of experts isn’t enough — you want as many people as possible. You do want experts, but you gain an additional level of scrutiny from having the whole fuckin’ world look at it.

Simple bugs can be caught by experts. But complex or subtle bugs are more insane. For those bugs, the number of people looking at the problem is much more important than the average skill of the readers. This is a strong particular argument for putting things on the internet and making them super enjoyable and accessible, rather than putting them in places where only experts will see them.

Not that we need any more reasons, but this is also a strong argument for publishing your research on blogs and vlogs instead of in stuffy formal journals. If you notice something weird that you can’t figure out, you should get it in front of the scientifically-inclined public as soon as possible, because one of them has the best chance of spotting whatever you have missed. Back in the day, the fastest way to get an idea in front of the scientifically-inclined public was to send a manuscript to the closest guy with a printing press, who would put it in the next journal. (Or if possible, go to a conference and give a talk about it.)

But journals today only want complete packages. If you write to them about the tiny animals you found in your spit, they aren’t going to want to publish that. Times have changed. Now the fastest way to get out your findings is to use a blog, newsletter, twitter, etc.

Job Posting: Reddit Research Czar

Job postings are a kinda weird phenomenon. For one thing, they’re very modern. It used to be that most people either inherited a job (I’m a baker because my pa was a baker and our tiny hamlet needs a baker) or noticed an opportunity and ran with it (lots of hungry travelers cross that bridge every day, I bet I could make a living selling pancakes).

We’re talking about the second thing today, the opportunity just waiting for someone to snap it up. This is a job posting, but we’re not hiring. Reddit is hiring. Well, not REDDIT. The abstract spirit of reddit is hiring. The universe is hiring. 

hmmm yes

Let us try to explain.

Czar was originally a term for East and South Slavic monarchs, most notably the Russian emperor — it’s another spelling of Tsar and yet another corruption of the Roman title Caesar, just like Kaiser. But at some point in the middle of the 20th century it became a term in the US and UK for government officials “granted broad power to address a particular issue”. The Industry Czar is in charge of industry, the Milk Czar is in charge of milk, the Asian Carp Czar is in charge of Asian Carp (no, really), and so on and so forth.

Carp Czar Gone Wild

There are lots of problems in the world; some are covered, but there are many others where existing institutions have totally dropped the ball. Often, more research would help. But the academy just doesn’t move as fast as it used to. If you’ve ever looked at something and been like, “someone should do a study”, you know what we mean.

Reddit is a bizarre, amazing place. Literally millions of people have come together to this place on the internet and self-sorted into about 3.4 million communities, called subreddits. True, many subreddits are dedicated to very niche porn or insane crypto schemes. But if you want to build a desktop gaming rig, get male or female fashion advice, or discover long, plush horrors, there’s a subreddit for that. You can learn so much about any topic or hobby, maybe too much if you’re not careful (compare). 

We’d like to apologize to the ghost of Alan Turing

This means there are lots of special populations on reddit, people who have a condition or illness, maybe a rare one, who are extreme outliers (e.g. very tall and/or live in a submarine), or who have a burning obsession with some niche idea. Subreddits bring people together, to commiserate, to try to help each other solve a problem, or to post insane fanart.

These people are all very interested in their shared topic. They are all highly motivated. Many of them are ready to self-experiment, or are already self-experimenting. A lot of things count as self-experimentation. If you’re doing a diet, or trying to get more sunlight, or even just trying to drink more water, that’s self-experimentation too. So a subreddit for a given problem or topic is a powder keg of interest and motivation, just waiting for a spark. 

Because while subreddits are very motivated, they’re largely untapped for organized research. Even in subreddits with good leadership, it’s rare for the leadership to have a research background. Most communities lack someone with the methods skills to design a good study, and the statistical analysis skills to examine the data afterwards. 

If you have these skills, and you are familiar with reddit, you could show up and start helping people organize research. You could collaborate with people to help them solve their problems, or at least learn more about their problems, and you could start doing it tomorrow. 

Redditors could never be coordinated enough to pull off something as complex as scientific research!

Crowdsourcing research like this is under-explored. Almost no one has ever done studies organized like this, so in our opinion, there’s virtually guaranteed to be low-hanging fruit all over the place. Anything that isn’t sexy enough for a major journal or doesn’t sound serious enough for the NIH to spend their time on is ripe for the picking.

The current research world is very narrow-minded. Doctors and researchers are quick to blame a person’s behavior or hygiene and very slow to blame environmental contaminants. If you’re more creative or more open-minded, and you’re willing to consider other paradigms, you can just move faster. If doctors don’t take the pathogen paradigm for chronic disease and digestive disorders seriously, then by becoming the “Pathogenic Disease Czar”, you might be able to rack up discoveries really quickly.

There’s also the question of “why now”? Part of it is that the research world has slowed down. But another part is that the rest of the world has sped up. We’re more coordinated than ever. Today you can get 100 people reading your latest newsletter in 20 minutes. Today you can pop by a subreddit and consult with thousands of people in a matter of hours. Today you can cold-email an emeritus professor who worked on the problem in the 1970s and be on a Zoom call with them next week. 

Research tools are also opening up, getting more accessible every day. If you’re leading the reddit charge on some rare glandular disorder, it now takes only a couple hundred dollars per person for everyone involved to get their genome sequenced and it’s getting cheaper all the time. If there’s a genetic explanation, or genetics is involved in some way, it’s only recently gotten cheap enough that communities might able to find it on their own.

There are lots of interesting ideas where the only support for them is a single paper with 20 participants from 1994. If you can get a couple dozen volunteers together, boom, you’ve just advanced the state of the field, and discovered whether or not there was anything to that interesting idea.

One example is our own ongoing all-potato diet study, which we see as the first of what will hopefully be a long tradition of community trials and community RCTs (randomized controlled trials). We’ve mostly recruited from twitter for the potato diet, but we just as easily could have recruited from reddit. For reference, this was the response on one subreddit, and not even a subreddit directly related to dieting.

Sometimes just planting a flag in the sand is enough. People like to feel like a part of something and are excited to participate. One participant in the potato diet said:

How do we get stronger evidence [for the potato diet]? Well someone has to go out on a limb and run an experiment. This is a particularly important motivation for me. If this were not part of a larger study, I wouldn’t spend my energy on it (after all, it probably won’t work). But the fact that it might yield useful data makes it much more appealing.

Obesity and related issues (heart disease, diabetes, etc.) is just one example of a serious problem that people are invested in solving. It seems like there are lots of problems where we might be able to quickly learn a lot by rigorous self-experimentation and community research. 

Depression and anxiety are classic unsolved problems. Sure, we have some mildly effective treatments, but why don’t we have great ones? Why does a given treatment work for some people and not others? What about people with treatment-resistant depression? Why are things like exhaustion and brain fog symptoms of depression? Where does depression come from? There’s been a lot of discussion but our take is still “no one knows” or at least, “the jury’s still out”. We see that r/depression/ has over 800,000 members and a couple thousand are usually online at a given time. If you think you could help, they seem like they would be glad to have it. 

Crohn’s disease is debilitating and remains very poorly understood — Wikipedia, for example, says, “While the precise causes of Crohn’s disease (CD) are unknown, it is believed to be caused by a combination of environmental, immune, and bacterial factors in genetically susceptible individuals. …  While Crohn’s is an immune-related disease, it does not appear to be an autoimmune disease (in that the immune system is not being triggered by the body itself). The exact underlying immune problem is not clear; however, it may be an immunodeficiency state.” Sounds like more research is needed, and r/CrohnsDisease/ has 42,000 members.

If that’s not mysterious enough for your taste, there are all the really inexplicable digestive conditions, which go by names like IBS (irritable bowel syndrome) and GERD (gastroesophageal reflux disease). These can really fuck you up, so people will be really motivated to try things and find a treatment. And there might be weird treatments out there that really work. You can drop by r/ibs/ with 74,000 members or r/GERD/ with 42,000 members and start putting out surveys, today if you want! (But talk to the mods first, don’t get kicked out for being a weirdo.)

But you won’t be the first researcher on the scene. We see that u/OrganicSquare made a post titled “Let’s use machine learning to help us find solutions to our reflux. I need this whole community to answer this survey for data!!!” on r/GERD about a year ago. We can’t find the results — maybe she’s still analyzing the data — but this is exactly the sort of thing we’re talking about. OrganicSquare, you are the hero reddit needs, let us know if you want to collaborate.

There are also some populations that will be interesting not because they are facing a problem they want to solve, but because they are special in some other way. Trans people would love to have better resources for transitioning, and you could certainly drop by to help them study that. But we think the real reason to drop by r/TransDIY/ and similar subreddits is because you have literally thousands of people conducting n = 1 endocrinology experiments.

There’s a good chance the next great endocrinologist will be trans, just because of their personal familiarity with the subject and ability to self-experiment. If you want to see what effect testosterone/estrogen/progesterone/estradiol has on mood/energy/digestion/attention/nerve growth/body temperature/whatever, this is one of your few and best chances to get experimental data. 

This is nowhere near a complete list. In fact, please drop other subreddits that might be excited to do more community research in the comments.

It’s more common than you might think

We call this a job posting because we think this could easily be a full-time job. If you help a community or two get closer to solving their problem, even if you just help them coordinate and give them HOPE that their problem is solvable, it would be pretty easy to convince lots of them to chip in. It’s hard for an individual to hire an expert, but some of these communities have tens or hundreds of thousands of members. For a community that size, hiring some full-time research muscle is easy.

You set up a Patreon or a newsletter (we recommend Ghost), and ask for support. If you can get 1000 people to give you $3 a month, that’s $36,000 a year, enough to start thinking about doing this full-time.

You don’t need to solve anything up front. You just need to convince 1000 people that you’re doing enough to justify them spending $3 a month on something they think is important, which is not a hard sell. And if you get 10,000 people on board for $1, you’re even better off. (Incidentally, here is our patreon.)

Crowdfunding is the best and noblest option, but it’s not the only route you can take. Some communities will have a millionaire or two in the ranks, and if you start doing good work, people will come out of the woodwork to help. There are lots of granting agencies out there looking for stunning projects to throw money at. Start coordinating reddit research for a few months, show that you’re serious, make a little progress, and it should be easy to make the case for some grants.

And actually, you might also be able to get funding from reddit, up to $50,000! Starting June 2022, reddit will start distributing one million dollars in community funding to different subreddits. If you can make the case to a subreddit that you can lead their community research for a year, they can apply for $40,000 to be your salary, and there’s a good chance they’ll get it. The article linked above says, “I can’t wait to see what wild project the r/WallStreetBets crew tries to get $50,000 to pull off.” Yeah holy shit.

Finally, if you are financially independent / have a good job that gives you lots of free time, then this is DEFINITELY a job suited for you. You already don’t have to worry about money; maybe you even have enough that you could pay for a statistician / the chemical analysis of samples / new air quality monitors / sundry other research expenses. You’re looking for something interesting to spend your time on, something that also makes the world a better place. If you have the skills and inclination, nothing could be a better fit!

It’s worth touching for a moment on the skills we think would be important. Any research on reddit would probably start with a lot of surveys, so someone with lots of experience with survey-based methods might have the advantage here. Possibly a sociologist or psychologist? But on the other hand, a lot of the problems reddit communities would be interested in solving are medical, so maybe someone with a medical background is the best person for the role. On the other other hand, a lot of the advantage here might be statistical, having the skill to work with big strange datasets, so maybe a data scientist.

Or form a cabal if you want:

Reddit Research Cabal

Anyways, if this is the job you want, and you think you have the skills to do it, there are two general ways to approach this…

Go Specific

If you are a person who is a member of one of these communities, who is inclined towards research and wants to rally people to solve the problem, going specific might be the approach for you.

There are a couple winning examples already, let’s take a look. These two don’t use reddit for the most part — they have communities elsewhere — but it’s not hard to imagine recreating some of their successes in a subreddit rather than on a blog or on twitter.

Scott Alexander is pretty much the research czar for rationalists, in his reader surveys (both back on SSC and now on ACX), and in some more specific work like the nootropics survey. Rationalists aren’t a community with a rare disease to cure, but they are united in their interest in specific topics, like AI, IQ, and birth order effects. And Scott, being a psychiatrist, has a special interest in things like SSRIs. We’re very interested in the small amount of work he’s done on air quality / ventilation, which we’ll note has included at least a little self-experimentation.

Whorelord and “mad social scientist” Aella is kind of de facto sex worker / sex research czar for the whole internet. She also does psychology and psychedelics research, which must be reasonably well-regarded because her twitter followers include some big names in psychology, like Paul Bloom and Uri Simonsohn (and see this interaction). But mostly it’s sex stuff, and the quality of her research puts the average social science publication to shame: 

Scott is a rationalist and Aella has lots of sex / is a (former) sex worker, so they’re perfectly positioned to be the research czars for their communities. We’d recommend that the “go narrow” approach be taken with communities you are a part of as well.

There are clear advantages to going narrow. First off, you can self-experiment. You can pilot-test studies on yourself, and you can show people that you would never ask them to do anything you aren’t willing to try first. You can specialize and learn a lot about this one area of research. And you’ll understand the topic better, because you’ve lived it.

There are also a couple of disadvantages. This has a smaller scope, but some of you might like that. It’s less exciting, and maybe harder to get support and raise money for projects. But it’s also more practical.

Go Broad

The other option is to try to become the Czar of all the Reddits.

In this approach, you try to work with lots of different subreddits, lots of different communities, and try to solve lots of different problems. Instead of focusing on just one mystery at a time, you go broad. 

If you are a generalist with good research chops, who spends a lot of time on reddit and knows how it works, who likes the idea of working with tons of different people, on dozens of projects, this might be the approach for you.

This approach has some clear advantages. If you work on more projects, you will be able to get funding from more quarters. As you try more and more things, you’ll learn a lot about the metascience of doing this new kind of community research. You can switch between projects when you’re waiting for results. If you hit a dead end on one question, you can take some time off and switch to something else. More things to work on means it’s more likely something will be a success.

There are also a few disadvantages. You’ll always risk getting spread too thin, and you will spend lots of time getting familiar with new topics, instead of going deep on just a few. You probably won’t share most of the problems you want to help solve. Since you don’t have these diseases/conditions/whatevers, you won’t be able to self-experiment, and self-experimentation is an important part of research. And some communities won’t want or appreciate help from an outsider.

To Sum Up

Reddit is a big place. There’s a lot of questions to answer, problems to solve, and communities to rally to the mad science crusade. 

Probably by 2030 there will be several major researchers on reddit, and two or three of them will be getting close to being household names. Some of them will be generalists who hop around different subreddits, consulting on different problems. Some of them will be specialists, organizing their communities against shared problems. Different research czars will work together to make bigger and better projects, and problems will get solved faster than anyone today thinks possible. 

But why wait to see other people do it? If you think you have what it takes (or half of what it takes; don’t be afraid to learn on the job), there’s nothing stopping you from doing this starting tomorrow. We’d be happy to consult on stats and methods — and if you do anything interesting, we might blog about it. If you declare yourself Czar of X and you make a big breakthrough, we will send you a crown (though it will not be this nice).