How to DIY New Scientific Protocols

Scientific research today relies on one main protocol — experiments with control groups and random assignment. In medical contexts, these are usually called randomized controlled trials, or RCTs. 

The RCT is a powerful invention for detecting population-level differences across treatments or conditions. If there’s a treatment and you want to know if it’s more effective than control or placebo, if you want to get an answer that’s totally dead to rights, the RCT is hard to beat. But there are some problems with RCTs that tend to get swept under the rug. 

Today we aim to unsweep. 

First, RCTs are seen as essential to science, but in fact they are historically unusual. RCTs were first invented in 1948, so most of science happened before they were even around. Galileo didn’t use RCTs, neither did Hooke, Lavoisier, Darwin, Kelvin, Maxwell, or Einstein. Newton didn’t use RCTs to come up with calculus or his laws of motion. He used observations and a mathematical model. So the idea that RCTs and other experiments are essential to science is ahistorical and totally wrong. 

If you were to ask doctors what findings they are most sure of, they would almost certainly include “smoking causes cancer” in their list. But we didn’t discover this connection by randomly assigning some people to smoke a pack a day and other people to abstain, over the course of several years. No. We used epidemiologic evidence to infer a causal relationship between the presumed cause and observed effect.

Second, the RCT is only one tool, and like all tools, it has specific limitations. It’s great for studying population-level differences, or treatments where everyone has a similar response. But where there is substantial heterogeneity of treatment, the RCT is a poor tool and often gives incoherent answers. And if heterogeneity is the main question of interest, it’s borderline useless.

Put simply, if people respond to a treatment in very different ways, an RCT will give results that are confusing instead of clarifying. If some people have a strong positive response to treatment and some people have no response at all, the RCT will distill this into the conclusion that there is a mild positive response to treatment, even if no individual participant has a mild positive response!

Also, RCTs are like, way inefficient. To test for a moderate effect size, you need several dozen or several hundred participants, and you can test only one hypothesis at a time. Each time you compare condition A to condition B, you find out which group does better. Maybe you want to see if a dose of 2 mg is better than a dose of 4 mg. But if there are a dozen factors that might make a difference, you need a dozen studies. If you want to test two hypotheses, you need two groups several dozen or several hundred participants, for three you will need at least three groups, et cetera. 

Third, RCTs don’t take advantage of modern cheap computation and search algorithms. For example, in the 1980s there was some interest in N=1 experiments for patients with rare cancers. This was difficult in the 1980s because of limited access to computers, even at research universities. But today you could run the same program on your cell phone a hundred times over. We’d be better off making use of these new insights and capabilities. 

Recent Developments

Statistics is young, barely two hundred years at the outside. And the most familiar parts are some of the youngest. Correlation was invented in the 1880s and refined in the 1890s. It’s not even as old as trains. 

choo choo

Turns out it is kinda easy to make new tools. The RCT is important, but it isn’t rocket science. A new century requires new scientific protocols. The 21st century is an era where communication is prolific and computation is cheap, and we should harness this power.

Since the early days, science has been based on doing experiments and sharing results. Researchers collect data, develop theories, and discuss them with other likeminded weirdos, freaks, and nerds. 

New technology has made it easier to do experiments and share results. And by “new technology”, we of course mean the internet. Just imagine trying to share results without email, make your data and materials public without the OSF or Google Drive or Dropbox, or collaborate on a manuscript by mailing a stack of papers across the country. Seriously, we used to live like that. Everyone did.

People do like the internet, and we also hear that they sometimes use it. Presumably a sensible, moderate amount. But just like the printing press, which was invented in 1440 but didn’t lead to the Protestant Reformation until 1517, the internet (and related tech like the computer and pocket computer, or “call phone”) has not yet been fully leveraged.

Let’s Put on our Thinking Caps

This is all easy enough to say, but at some point you need to consider how to come up with totally new research methods.

We take three main angles, which are historical, analogical, and tinkering. Basically: Look at how people came up with new methods in the past. Look at successful ideas from other fields and try applying them to science. And look at the different ideas and see what happens when you expose them to nature. 

We begin with close reads and analysis of the successful development of past protocols (for example, the scientific innovation around the cure for scurvy). 

We develop new scientific protocols by analogy to successful protocols in other areas. For example, self-experiments are somewhat like debugging (programmers in the audience will be familiar with suspicion towards stories of “well, it worked on MY setup”). The riff trial was developed in analogy to evolution.

Finally, we deploy simple versions of these protocols as quickly as possible so that we can tinker with them and benefit from the imagination of nature. This is also somewhat by analogy to hacker development methods, and startup concepts like the minimum viable product. We try out new ideas as soon as they are ready, and all of our work is published for free online, so other people can see our ideas and tinker with them too.

Here are some protocols we’ve been dreaming about that show exceptional promise: 

N=1

The idea of N = 1 experiments / self-experiments has been around for a while, and there are some famous case studies like Nobel Laureate Barry Marshall’s self-administration of H. Pylori to demonstrate its role in stomach ulcers and stomach cancer. But N = 1 protocols have yet to reach their full potential. 

There’s a lot of room to improve this method, especially for individuals with chronic illnesses/conditions that bamboozle the doctors. N = 1 studies have particular considerations, like hidden variables. You can’t just slap on a traditional design, you need to think about things like latency and half-life. And many of the lessons of N = 1 generalize to N of small

Community Trial

The Community Trial is a protocol that blurs the line between participant and researcher. In these trials, an organizer makes a post providing guidelines and a template for people to share their data. Participants then collect their own data and send it to the organizer, who compiles and analyzes the results, sharing the anonymized data in a public repository.

Data collection is self-driven, so unlike a traditional RCT, participants can choose to measure additional variables, participate in the study for longer than requested, and generally take an active role in the study design. 

Unlike most RCTs, community trials allow for rolling signups, and could be developed into a new class of studies that run continuously, with permanently open signups and an ever-growing database of results with a public dashboard for analysis. 

We first tested this with the Potato Diet Community Trial (announcement, results), where 209 people enrolled in a study of an all-potato diet, and the 64 people who completed 4 weeks lost an average of 10.6 lbs. Not bad.

Reddit Trials

There’s a possible extension of the community trial that you might call a “Reddit Trial”. 

In this protocol, participants in an online community (like a subreddit) that all share a common interest, problem, or question (like a mystery chronic illness) come together and invent hypotheses, design studies, collect data, perform analysis, and share their results. As in a community trial, participants can take an active role in the research, measure additional variables, formulate new hypotheses as they go, etc.  

People seem to think that a central authority makes things better, but we think for design and discovery that’s mostly wrong. You want the chaos of the marketplace, not the rigid stones of the cathedral. Every bug is shallow if one of your readers is an entomologist.

This could be more like a community trial, where one person, maybe even a person from outside the community, takes the lead. But it could also be very different from a community trial, if the design and leadership is heavily or enormously distributed. There’s no reason that rival factions within a community, splintering over design and analysis, might not actually make this process better.

We already wrote a bit about similar ideas in Job Posting: Reddit Research Czar. And none other than Patrick Collison has come to a closely-related conclusion in a very long tweet, saying: 

Observing some people close to me with chronic health conditions, it’s striking how useful Reddit frequently ends up being. I think a core reason is because trials aren’t run for a lot of things, and Reddit provides a kind of emergent intelligence that sits between that which any single physician can marshal and the full rigor of clinical trials.

… Reddit — in a pretty unstructured way — makes a limited kind of “compounding knowledge” possible. Best practices can be noticed and can imperfectly start to accumulate. For people with chronic health problems, this is a big deal, and I’ve heard lots of stories between “I found something that made my condition much more manageable” all the way to “I found a permanent cure in a weird comment buried deep in a thread”. 

… Seeing this paper and the Reddit experience makes me wonder whether the approach could somehow be scaled: is there a kind of observational, self-reported clinical trial that could sit between Reddit and these manual approaches? Should there be a platform that covers all major chronic conditions, administers ongoing surveys, and tracks longitudinal outcomes?

We think the answer is: obviously yes. It’s just up to people to start running these studies and learning from experience. We’re also reminded of Recommendations vs. Guidelines from old Slate Star Codex.

Riff Trials

The Riff Trial takes a treatment or intervention which is already somewhat successful and recruits participants to self-assign to close variations on the original treatment. Each variation is then tested, and the results reported back to the organizers. 

This uses the power of parallel search to quickly test possible boundary conditions, and discover variations that might improve upon the original. Since each variation is different, and future signups can make use of successful results, this can generate improvements based on the power of evolution. 

We tested this protocol for the first time in the SMTM Potato Diet Riff Trial, with four rounds of results reported (Round 1, Round 2, Round 3, Retrospective). 

This has already led to at least one discovery. While we originally thought that consuming dairy would stop the potato diet’s weight loss effects, multiple riff trials demonstrated that people keep losing weight just fine when they have milk, butter, even sour cream with their potatoes. Consuming dairy does not seem to be a boundary condition of the potato diet, as was originally suspected. This also seems to disprove the idea that the standard potato diet works because it is a mono-diet, boring, or low-fat. How can it work from being a mono-diet, boring, or low-fat if it still works when you add various dairy products, delicious dairy products, and high-fat dairy products? 

There are hints of other discoveries in this riff trial too, like the fact that the diet kept working for one guy even when he added skittles. But that’s still to be seen.

“Bullet-Biting”

In most studies, people have a problem and want the effect to work. If it’s a weight loss study, they want to lose weight, and don’t want the weight loss to stop. So participants are hesitant to “bite the bullet” and try variations that might stop the effect

This creates a strong bias against testing which parts of the intervention are actually doing the work, which elements are genuinely necessary or sufficient. It makes it much harder to identify the intervention’s real boundary conditions. So while you may end up with an intervention that works, you will have very little idea of why it works, and you won’t know if there’s a simpler version of the intervention that would work just as well; or maybe better. 

We find this concerning, so we have been thinking about a new protocol where testing these boundaries is the centerpiece of the approach. For now we call it a “bullet-biting trial”, in the sense that it guides researchers and participants to bite the bullet (“decide to do something difficult or unpleasant in order to proceed”) of trying things that might kill the effect.

In this protocol, participants first test an intervention over a baseline period, to confirm that the standard intervention works for them. 

Then, they are randomized into conditions, each condition being a variation that tests a theoretical or suspected boundary condition for the effect (e.g. “The intervention works, but it wouldn’t work if we did X/didn’t do Y.”). 

For example, people might suspect that the potato diet works because it is low fat, low sugar, or low seed oils. In this protocol, participants would first do two weeks of a standard potato diet, to confirm that they are potato diet responders. No reason to study the effect in people who don’t respond! Then, anyone who lost some minimum amount of weight over the baseline period would be randomized into a high-fat, high-sugar, or high-seed-oil variant of the potato diet for at least two weeks more. If any of these really are boundary conditions, and stop the weight loss dead, well, we’d soon find out. 

By randomly introducing potential blockers, you can learn more about how robust an intervention truly is. Maybe the intervention you’ve been treating so preciously actually works just fine when you’re very lax about it! More importantly, you can test theories of why the intervention works, since different theories will usually make strong predictions about conditions under which an intervention will stop working. And this design might help us better understand differences between individuals — it may reveal that certain variations are a boundary condition for some people, but not for others. 

Introducing The People’s Bill

I.

In a recent speculative post about tricameral legislatures, we explored a couple different ways you could elect members of a legislative body. 

The clear winner was electing representatives by lottery, drawing them randomly from the pool of all adult citizens or all voters, for a fixed term (formally known as sortition). Since election is by random selection, as long as the chamber has enough members, it’s guaranteed to be largely representative in terms of gender, race, religion, age, profession, and so on. Representatives would be ordinary people, instead of career politicians.

Now, while it’s very fun to sit in our armchairs and speculate about political science, the truth is that we don’t have much influence on how the branches of government are organized. The United States will not be switching to a Tricameral system or electing representatives by sortition any time soon. Neither will any other country in the world, is our guess. Despite centuries of research on various voting systems, lots of countries are still using first-past-the-post voting. It’s hard to imagine this will be much different.

We don’t have the power to make this happen. But we do have the power to set up a website. 

II.

So today we’d like to introduce a little idea we call The People’s Bill. Why do we trust politicians, lowest of the low, to write our laws for us? We’re Americans, by God. We can write our own laws.

The idea is pretty simple. We could set up a website, with a text form that anyone could edit, and the People could write whatever bill they want. 

If you’re concerned that only Americans should be able to write American laws, then we could limit editing privileges to IP addresses from within the US. But there are ways around this, of course, and why not take good ideas from the rest of the world? 

To keep it getting obscenely long, as bills often do, we would set up a character limit. As a red-blooded American I obviously want to set the limit to 1,776 characters, but that’s probably not long enough (by this parenthetical, this post has already passed 1,776 characters). Setting it up to be 100 tweets long would also be amusing, but that’s only 2,800 characters. But we notice that the Declaration of Independence is about 8,000 characters long, depending on version, so let’s go with that.

People would have a month to debate and draft as much as they want, within those limits. Then, at the end of every month, the bill would be finalized, and closed to editing. A permanent snapshot would be taken, and automatically emailed to all 100 senators and 435 representatives, with instructions that this is the Will of the People et cetera et cetera. 

If you have any experience with online assignments, you know that closing an assignment at midnight can get pretty crazy. To help prevent a furious final dash to make edits at 11:59 PM the night before, we wouldn’t take the final snapshot at midnight. Instead, we would randomly select a time on the day in question, keep that random deadline a secret, and take the final snapshot then. 

With this system, there’s no question what bills people want passed. Every member of Congress gets an email about it every month, containing a bill that the People wrote and that contains a curated list of what they want passed into law. It may not end up being the bill the country needs. But it’s hard to imagine it won’t end up being the bill we deserve.

III.

You may be feeling skeptical that people can coordinate on the internet, let along coordinate to produce anything of value. But we think there’s reason to believe that this isn’t such a problem.

First of all, open-source software is an unqualified success. Linux was started by one Finn at the tender age of 21, and thanks to decades of collaborative writing from the community, now contains several million lines of code. Apache is free, open-source, and serves about 25% of all websites. If you’re reading this, there’s a good chance your browser is one of these success stories — Firefox is fully open-source and the open-source Chromium project forms the base for both Google Chrome and Microsoft Edge. 

Of course, all of these had some kind of central leadership. Linus Torvalds coordinated the development of Linux at some level, even if he didn’t write all the code himself.

But people are perfectly capable of coordinating themselves, given the chance. Consider Reddit’s 2017 April Fools’ Day project, called Place. This project started with a canvas 1000 pixels wide by 1000 pixels tall, for a cool one million pixels total. For 72 hours, Reddit users could place a new pixel every 5-20 minutes, in any of sixteen different colors. Despite there being no top-down organization or authority, the redditors soon organized themselves and the canvas into stunning displays of coordination. The final canvas included dozens of national flags, logos, memes, a rainbow road, a Windows 95 taskbar, a recreation of the Mona Lisa (though she appears to be flipping us the bird), and a complete rendition of The Tragedy of Darth Plagueis The Wise, courtesy of r/prequelmemes. You can see the final canvas in all its maddening glory here, a timelapse of its evolution here, and the Wikipedia page for the project here.  

Of course, Wikipedia itself may be the greatest of all crowdsourced endeavors. It is the largest encyclopedia in the world, and probably the best. In high school, our teachers told us not to cite Wikipedia as a source, as it was too unreliable. Today, media giants like Facebook and YouTube use Wikipedia entries in the fight against fake news. All courtesy of any random yahoo with an internet connection. 

All right, so ordinary people can make open-source software, collaborate to create giant pixel-art renditions of copypastas and Renaissance masterpieces, and can even create the largest encyclopedia in all of history. Can ordinary people really write laws, though? Laws aren’t like pixel art, or even encyclopedia pages, right?

Indeed they’re not. First of all, making good pixel art is really hard, probably harder than writing laws. Second, history shows us that normal people can write perfectly good laws — you don’t need to be a lawyer or career politician. Why would you?

On our tricameralism post, one commenter mentioned Ezra Klein’s interview of Hélène Landemore (NYT, archive.is), a political scientist. We’ll have to resist quoting it at length here; seriously, give it a read or a listen. 

Particularly interesting were the stories she told about ordinary citizens writing laws for themselves. Here’s one:

Iceland decided to rewrite its constitution in 2010. And they decided to use a very innovative, inclusive, participatory method. They started with a national forum of 950 randomly-selected citizens that were tasked with coming up with the main values and ideas that they wanted to see entrenched in the new document.

And then they had an election to choose 25 constitution drafters, if you will, among a pool of nonprofessional politicians, because they had been convinced, after the 2008 crisis, that they were all corrupt. So by law, they were excluded from participating in this election. And those 25 decided to work with the larger public by publishing their drafts at regular intervals, putting them online and collecting some feedback through a crowdsourced sort of process. And then they put the resulting proposal to a nationwide referendum. Two-thirds of the voting population approved, and then parliament killed it and never turned it into a bill.

IV.

Will the People’s Bill fix our current political system? Honestly, we doubt it. Like the constitution drafters in Iceland, we fully expect Congress will kill the People’s Bill every time it comes around. Most months it will probably never get proposed; if it ever is proposed, most of the ideas in the bill will probably never make it into law. 

There are reasons to try this idea anyways. First of all, if Congress ignores the suggestions of the People, this will be another way of making it clear where their priorities lie (as if you needed any more convincing, but still).

Second, writing our own bill every month, even if it never becomes law, gets people involved in democracy. It’s a chance for people to discover they can write laws that are just as good as the laws written in Albany, Austin, Tallahassee, Denver, and Washington. Will the People’s Bill be messier than bills written by politicians? Yes, but it will also be more original, and more creative. Will the People’s Bill contain allusions to The Tragedy of Darth Plagueis The Wise? Almost certainly.  

Laws on the books are unclear and poorly written; perhaps at times intentionally so. If you have no experience writing laws yourself, you might be tempted to assume the problem is with you. But if you’ve written parts of the People’s Bill for three years running, maybe you’ll look at a piece of federal legislation and say, “You call this a law? My grandmother could write a better law than this!” And maybe you’d be right, because maybe she helps write the People’s Bill too. 

If we’re lucky, this new and well-deserved confidence will inspire more ordinary people to run for office, question the legal status quo, and so on. It lowers the barriers to entry, and encourages people to open their minds to new approaches to governance.

Third, it will help loosen the grip of the laws on our mind. It’s one thing to understand intellectually that laws were written by morons just like you and me, but it’s a whole other thing to actually be one of the morons writing the law. The US legal code wasn’t handed down on Mount Sinai. People wrote those laws, and sometimes they will be flawed, backwards, or just plain stupid. Americans used to have a much healthier disrespect for the law, and it’s time we brought that back.  

Fourth, and finally, there are a lot of great policy ideas out there, but as a country we tend to discuss the same few ideas over and over again — like an otter chasing its tail, only much less cute. The People’s Bill would be an opportunity to discuss great policy ideas that aren’t even on the radar right now. To discover good ideas that are considered normal in other places and times, but that aren’t on the docket here. Some of them will sound crazy, and some of them might even be crazy. But some ideas that sound crazy right now will end up being policy twenty years from now. Whatever else you might think of the idea, it seems like a safe bet that randos on the internet can beat United States Senators at coming up with out-of-the-box ideas.

V.

We haven’t set up the website for The People’s Bill yet, because in the true democratic spirit of the project, we want to get suggestions about how to set it up; how the website should be structured, what software we should consider, and so on. Below are a few of our thoughts, but we’re sure you will have other suggestions, and we really want to hear your ideas.

A very straightforward option would be to set this up using MediaWiki. Wikis have talk pages, edit histories, and make it relatively simple to manage users and permissions. Every month we could set up a new page for that month’s bill, and lock the page at the end of the month. This would probably be the easiest way to set up the site.

However, a wiki wouldn’t allow people to draft different bills in parallel, and wouldn’t make it easy to compare different drafts of the same bill. There wouldn’t be any way to figure out which version of the bill has the most popular support — whoever edited the wiki most recently would always have the final say. So another option would be to use something like a forum. Different bills could be in different threads, and users could vote on which bills they like. At the end of the month, the top thread would be sent to Congress, and the rest would be locked, starting the cycle all over again. You could literally just use a subreddit for this, or you could build some kind of custom forum setup.

We could dream up more esoteric options too, though they would probably require more effort. You could link up Git to some kind of forum interface, allowing people to both vote on and branch bills as they saw fit, with all branches appearing as their own posts on the forum, complete with comments, vote tallies, and so on.

Some of these systems are more chaotic than others. A single wiki page, for example, would be sort of maddening. Anyone could wander in at any time and change the entire bill. Anyone could wander in at any time and revert the bill to a previous version. In contrast, a more forum-like approach might force the bill into reasonable sections and subsections, which could be clearly debated. This has obvious benefits, but this country already has a method for writing bills in the normal way — it’s called Congress.

We might even want to make The People’s Bill as chaotic as possible. Some months the bill might end up being the first 8,000 characters of the script of Bee Movie, but that’s a risk we’re willing to take. Just imagine Ted Cruz getting that bill in his email. 

Whatever we use, we want to make sure that it’s easily accessible. It should be easy to use and easy to join — we literally want your grandma to help draft these bills. Anyone with an internet connection should be able to join up, without too much trouble. So while esoteric systems might have some nice features, we have to balance that against wide engagement. It’s only democratic.


Thanks to Taylor Hadden and Casey Jamieson for giving feedback on a draft of this post.