Early on in science there would never even could be a replication crisis or anything because everyone was just trying all the stuff. They were writing letters to each other with directions, trying each others’ studies, and seeing what they could confirm for themselves.
Today, scientists would tell you that replicating someone else’s work takes decades of specialized training, because most findings are too subtle and finicky to be reproduced by just anyone. For example, consider this story from Harvard psychology professor Jason Mitchell, about how directions depend on implicit knowledge, and it’s impossible to fully explain your procedure to anyone:
I have a particular cookbook that I love, and even though I follow the recipes as closely as I can, the food somehow never quite looks as good as it does in the photos. Does this mean that the recipes are deficient, perhaps even that the authors have misrepresented the quality of their food? Or could it be that there is more to great cooking than just following what’s printed in a recipe? I do wish the authors would specify how many millimeters constitutes a “thinly” sliced onion, or the maximum torque allowed when “fluffing” rice, or even just the acceptable range in degrees Fahrenheit for “medium” heat. They don’t, because they assume that I share tacit knowledge of certain culinary conventions and techniques; they also do not tell me that the onion needs to be peeled and that the chicken should be plucked free of feathers before browning. … Likewise, there is more to being a successful experimenter than merely following what’s printed in a method section. Experimenters develop a sense, honed over many years, of how to use a method successfully. Much of this knowledge is implicit.
Mitchell believes in a world where findings are so fragile that only extreme insiders, close collaborators of the original team, could possibly hope to reproduce their findings. The implicit message here is something like, “don’t bother replicating ever; please take my word for my findings.”
The general understanding of replication is slightly less extreme. To most researchers, replication is when one group of scientists at a major university reproduce the work of another group of scientists at a different major university. There’s also a minority position that replications should be done by many labs, that replication is an internal process of double-checking: “take the community’s word”.
But this doesn’t seem quite right to us either. If a finding can’t be confirmed by outsiders like you — if you can’t see it for yourself — it doesn’t really “count” as replication. This used to be the standard of evidence (confirm it for yourself or don’t feel bound to take it seriously) and we think this is a better standard to hold ourselves to.
It’s not that Mitchell is wrong — he’s right, there is a lot of implicit knowledge involved in doing anything worth doing. Sometimes science is really subtle and hard to replicate at home; other times, it isn’t. But whether or not a particular study is easy or hard to replicate is a dodge. This argument is a load of crap because the whole reason to do research in the first place is a fight against received wisdom.
The motto of the Royal Society, one of the first scientific societies, was and still is nullius in verba. Roughly translated, this means, “take no one’s word” or “don’t take anyone’s word for it”. We think this is a great motto. It’s a good summary of the kind of spirit you need to investigate the world. You have the right to see for yourself and make up your own mind; you shouldn’t have to take someone’s word. If you can take someone else’s word for it — a king, maybe — then why bother?
In the early 1670s, Antonie van Leeuwenhoek started writing to the Royal Society, talking about all the “little animals” he was seeing in drops of pond water when he examined them under his new microscopes. Long particles with green streaks, wound about like serpents, or the copper tubing in a distillery. Animals fashioned like tiny bells with long tails. Animals spinning like tops, or shooting through the water like pikes. “Little creatures,” he said, “above a thousand times smaller than the smallest ones I have ever yet seen upon the rind of cheese.”
Naturally, the Royal Society found these reports a little hard to believe. They had published some of van Leewenhoek’s letters before, so they had some sense of who the guy was, but this was almost too much:
Christiaan Huygens (son of Constanijn), then in Paris, who at that time remained sceptical, as was his wont: ‘I should greatly like to know how much credence our Mr Leeuwenhoek’s observations obtain among you. He resolves everything into little globules; but for my part, after vainly trying to see some of the things which he sees, I much misdoubt me whether they be not illusions of his sight’. The Royal Society tasked Nehemiah Grew, the botanist, to reproduce Leeuwenhoek’s work, but Grew failed; so in 1677, on succeeding Grew as Secretary, Hooke himself turned his mind back to microscopy. Hooke too initially failed, but on his third attempt to reproduce Leeuwenhoek’s findings with pepper-water (and other infusions), Hooke did succeed in seeing the animalcules—‘some of these so exceeding small that millions of millions might be contained in one drop of water’
People were skeptical and didn’t take van Leewenhoek at his word alone. They tried to get the same results, to see these little animals for themselves, and for a number of years they failed. They got no further help from van Leewenhoek, who refused to share his methods, or the secrets of how he made his superior microscopes. Yet even without a precise recipe, Hooke was eventually able to see the tiny, wonderful creatures for himself. And when he did, van Leewenhoek became a scientific celebrity almost overnight.
If something is the truth about how the world works, the truth will come out, even if it takes Robert Hooke a few years to confirm your crazy stories about the little animals you saw in your spit. Yes, research is very exacting, and can demand great care and precision. Yes, there is a lot of implicit knowledge involved. The people who want to see for themselves might have to work for it. But if you think what you found is the real McCoy, then you should expect that other people should be able to go out and see it for themselves. And assuming you are more helpful than van Leewenhoek, you should be happy to help them do it. If you don’t think people will be able to replicate it at their own bench, are you sure you think you’ve discovered something?
Fast forward to the early 1900s. Famous French Physicist Prosper-René Blondlot is studying the X-Rays, which had been first described by Wilhelm Röntgen in 1895. This was an exciting time for rays of all stripes — several forms of invisible radiation had just been discovered, not only X-Rays but ultraviolet light, gamma rays, and cathode rays.
So Blondlot was excited, but not all that surprised, when he discovered yet another new form of radiation. He was firing X-rays through a quartz prism and noticed that a detector was glowing when it shouldn’t be. He performed more experiments and in 1903 he announced the discovery of: N-rays!
Blondlot was a famous physicist at a big university in France, so everyone took this seriously and they were all very excited. Soon other scientists had replicated his work in their own labs and were publishing scores of papers on the subject. They began documenting the many strange properties of N-rays. The new radiation would pass right through many substances that blocked light, like wood and aluminum, but were obstructed by water, clouds, and salt. They were emitted by the sun and by human bodies (especially flexed muscles and certain areas of the brain), as well as rocks that had been left in the sun and been allowed to “soak up” the N-rays from sunlight.
The procedure for detecting these rays wasn’t easy. You had to do everything just right — you had to use phosphorescent screens as detectors, you had to stay in perfect darkness for a half hour so your eyes could acclimate, etc. Fortunately Blondlot was extremely forthcoming and always went out of his way to help provide these implicit details he might not have been able to fit in his reports. And he was vindicated, because with his help, labs all over the place were able to reproduce and extend his findings.
Well, all over France. Some physicists outside France, including some very famous ones, weren’t able to reproduce Blondlot’s findings at all. But as before, Blondlot was very forthcoming and did his best to answer everyone’s questions.
Even so, over time some of the foreigners began to get a little suspicious. Eventually some of them convinced an American physicist, Robert W. Wood, to go visit Blondlot in France to see if he could figure out what was going on.
Blondlot took Wood in and gave him several demonstrations. To make a long story short (you can read Wood’s full account here; it’s pretty interesting), Wood found a number of problems with Blondlot’s experiments. The game was really up when Wood secretly removed a critical prism from one of the experiments, and Blondlot continued reporting the same results as if nothing had happened. Wood concluded that N-rays and all the reports had been the work of self-deception, calling them “purely imaginary”. Within a couple of years, no one believed in N-rays anymore, and today they’re seen as a cautionary tale.
So much for the subtlety and implicit knowledge needed to do cutting-edge work. Maybe your results are hard to get right, but maybe if other people can’t reproduce your findings, they shouldn’t take your word for it.
This is the point of all those chemistry sets your parents (or cool uncle) gave you when you were a kid. This is the point of all those tedious lab classes in high school. They were poorly executed and all but this was the idea. If whatever Röntgen or Pasteur or Millikan or whoever found is for real, you should be able to reproduce the same thing for yourself in your high school with only the stoner kid for a lab assistant (joke’s on you, stoners make great chemists — they’re highly motivated).
Some people will scoff. After all, what kind of teenager can replicate the projects reported in a major scientific journal? Well, as just one example, take Dennis Gabor: “during his childhood in Budapest, Gabor showed an advanced aptitude for science; in their home laboratory, he and his brother would often duplicate the experiments they read about in scientific journals.”
Clearly some studies will be so complicated that Hungarian teenagers won’t be able to replicate them, or may require equipment they don’t have access to. And of course the Gabor brothers were not your average teenagers. But it used to be possible, and it should be made possible whenever possible. Because otherwise you are asking the majority of people to take your claims on faith. If a scientist is choosing between two lines of work of equal importance, one that requires a nuclear reactor and the other that her neighbor’s kids can do in their basement, she should go with the basement.
It’s good if one big lab can recreate what another big lab claims to have found. But YOU are under no obligation to believe it unless you can replicate it for yourself.
You can of course CHOOSE to trust the big lab, look at their report and decide for yourself. But that’s not really replication. It’s taking someone’s word for something.
There’s nothing wrong with taking someone’s word; you do it all the time. Some things you can’t look into for yourself; and even if you could, you don’t have enough time to look into everything. So we are all practical people and take the word of people we trust for lots of things. But that’s not replication.
Something that you personally can replicate is replication. Watching someone else do it is also pretty close, since you still get to see it for yourself. Something that a big lab would be able to replicate is not really replication. It’s nice to have confirmation from a second lab, but now you’re just taking two people’s word for it instead of one person’s. Something that can in principle be replicated, but isn’t practical for anyone to actually attempt, is not replication at all.
If it cannot be replicated even in principle, then what exactly do you think you’re doing? What exactly do you think you’ve discovered here?
We find it kind of concerning that “does replicate” or “doesn’t replicate” have come to be used as synonyms of “true” and “untrue”. It’s not enough to say that things replicate or not. Blondlot’s N-ray experiments were replicated hundreds of times around France, until all of a sudden they weren’t; van Leeuwenhoek’s observations of tiny critters in pond water weren’t replicated for years, until they were. The modern take on replication (lots of replications from big labs = good) would have gotten both of these wrong.
If knowing the truth about some result is important to you, don’t just take someone’s word for it. Don’t leave it up to the rest of the world to do this work; we’re all bunglers, you should know that. If you can, you should try it for yourself.
So let’s look at some examples of REAL replication. We’ll take our examples from psychology, since as we saw earlier, they’re in the thick of the modern fight over replication.
We also want to take a minute to defend the psychologists, at least on the topic of replication (psychology has other sins, but that’s a subject for another time). Psychology has gotten a lot of heat for being the epicenter of the replication crisis. Lots of psychology studies haven’t replicated under scrutiny. There have been many high-profile disputes and attacks. Lots of famous findings seem to be made out of straw.
Some people have taken this as a sign that psychology is all bunkum. They couldn’t be more wrong — it’s more like this. One family in town gets worried and hires someone to take a look at their house. The specialist shows up and sure enough, their house has termites. Some of the walls are unsafe; parts of the structure are compromised. The family is very worried but they start fumigating and replacing boards that the termites have damaged to keep their house standing. All the other families in town laugh at them and assume that their house is the most likely to fall down. But the opposite is true. No other family has even checked their home for termites; but if termites are in one house in town, they are in other houses for sure. The first family to check is embarrassed, yes, but they’re also the only family who is working to repair the damage.
The same thing is going on in psychology. It’s very embarrassing for the field to have their big mistakes aired in public; but psychology is also distinct for being the first field willing to take a long hard look at themselves and make a serious effort to change for the better. They haven’t done a great job, but they’re one of the only fields that is even trying. We won’t name names but you can bet that other fields have just as many problems with p-hacking — the only difference is that those fields are doing a worse job rooting it out.
The worst thing you can say about psychology is that it is still a very young field. But try looking at physics or chemistry when they were only 100 years old, and see how well they were doing. From this perspective, psychology is doing pretty ok.
Despite setbacks, there has been some real progress in psychology. So here are a few examples of psychological findings that can actually be replicated, by any independent researcher in an afternoon. You don’t have to take our word or anyone else’s word for these findings if you don’t want to. Try it for yourself! Please do try this at home, that’s the point.
Are these the most important psychology findings? Probably not — we picked them because they’re easy to replicate, and you should be able to confirm their results from your sofa (disclaimer: for some of them, you may have to leave your sofa). But all of them are things we didn’t know about 150 years ago, so they represent a real advance in what we know about the mind.
For most of these you will need a small group of people, because most of these are statistically true results, not guaranteed to work in every case. But as long as you have a dozen people or so, they should be pretty reliable.
Draw a Bicycle — Here’s a tricky one you can do all on your own. You’ve seen a bicycle before, right? You know what they look like? Ok, draw one.
Unless you’re a bicycle mechanic, chances are you’ll be really rubbish at this — most people are. While you can recognize a bicycle no problem, you don’t actually know what one looks like. Most people produce drawings that look something like this:
Needless to say, that’s not a good representation of the average bicycle.
Seriously, try this one yourself right now. Don’t look up what a bicycle looks like; draw it as best you can from memory and see what you get. We’ll put a picture of what a bicycle actually looks like at the end of this post.
Then, tweet your bicycle drawings at us at @mold_time on twitter.
(A similar example: which of the images below shows what a penny looks like?)
Wisdom of the Crowd — Wisdom of the crowd refers to the fact that people tend to make pretty good guesses on average even when their individual guesses aren’t that good.
You can do this by having a group of people guess how many jellybeans are in a jar of jellybeans, or how much an ox weighs. If you average all the guesses together, most of the time it will be pretty close to the right answer. But we’ve found it’s more fun to stand up there and ask everyone to guess your age.
We’ve had some fun doing this one ourselves, it’s a nice trick, though you need a group of people who don’t know you all that well. It works pretty well in a classroom.
This only works if everyone makes their judgments independently. To make sure they don’t influence each other’s guesses, have them all write down their guesses on a piece of paper before blurting it out.
Individual answers are often comically wrong — sometimes off by up to a decade in both directions — but we’ve been very impressed. In our experience the average of all the guesses is very accurate, often to within a couple of months. But give it a try for yourself.
Emotion in the Face — You look at someone’s face to see how they’re feeling, right? Well, maybe. There’s a neat paper from a few years ago that has an interesting demonstration of how this isn’t always true.
They took photos of tennis players who had just won a point or who had just lost a point, and cut apart their faces and bodies (in the photos; no tennis pros were harmed, etc.). Then they showed people just the bodies or just the faces and asked them to rate how positively or negatively the person was feeling:
They found that people could usually tell that a winning body was someone who was feeling good, and a losing body was someone feeling bad. But with just the faces, they couldn’t tell at all. Just look above – for just the bodies, which guy just won a point? How about for the faces, who won there?
Then they pushed it a step further by putting winning faces on losing bodies, and losing faces on winning bodies, like so:
Again, the faces didn’t seem to matter. People thought chimeras with winning bodies felt better than chimeras with losing bodies, and seemed to ignore the faces.
This one should be pretty easy to test for yourself. Go find some tennis videos on the internet, and take screenshots of the players when they win or lose a point. Cut out the faces and bodies and show them to a couple friends, and ask them to rate how happy/sad each of the bodies and faces seems, or to guess which have just won a point and which have just lost. You could do this one in an afternoon.
Anchoring — This one is a little dicey, and you’ll need a decent-sized group to have a good chance of seeing it.
Ask a room of people to write down some number that will be different for each of them — like the last four digits of their cell phone number, or the last two digits of their student ID or something. Don’t ask for part of their social security number or something that should be kept private.
Let’s assume it’s a classroom. Everyone takes out their student ID and writes down the last two digits of their ID number. If your student ID number is 28568734, you write down “34”.
Now ask everyone to guess how old Mahatma Gandhi was when he died, and write that down too. If this question bores you, you can ask them something else — the average temperature in Antarctica, the average number of floors in buildings in Manhattan, whatever you like.
Then ask everyone to share their answers with you, and write them on the board. You should see that people who have higher numbers as the last two digits of their student ID number (e.g. 78 rather than 22) will guess higher numbers for the second question, even though the two numbers are unrelated. They call this anchoring. You can plot the student ID digits and the estimates of Gandhi’s age on a scatterplot if you like, or even calculate the correlation. It should come out positive.
Inattentional Blindness — If you’ve taken an intro psych class, then you’re familiar with the “Invisible Gorilla” (for everyone else, sorry for spoiling). In the biz they call this “inattentional blindness” — when you aren’t paying attention, or your attention is focused on one task, you miss a lot of stuff.
Turns out this is super easy to replicate, especially a variant called “change blindness”, where you change something but people don’t notice. You can swap out whole people and about half the time, no one picks up on it.
Because it’s so easy, people love to replicate this effect. Like this replication from NOVA, or this British replication, or this replication from National Geographic. You can probably find a couple more on YouTube if you dig around a bit.
This one isn’t all that easy to do at home, but if you can find a couple accomplices and you’re willing to play a prank on some strangers, you should be able to pull it off.
(Or you can replicate it in yourself by playing I’m on Observation Duty.)
False Memory — For this task you need a small group of people. Have them put away their phones and writing tools; no notes. Tell them you’re doing a memory task — you’ll show them a list of words for 30 seconds, and you want them to remember as many words as possible.
Then, show them the following list of words for 30 seconds or so:
After 30 seconds, hide or take down the list.
Then, wait a while for the second half of the task. If you’re doing this in a classroom, do the first step at the beginning of class, and the second half near the end.
Anyways, after waiting at least 10 minutes, show them these words and ask them, which of the words was on the original list?
Most people will incorrectly remember “sleep” as being on the original list, even though, if you go back and check, it’s not. What’s going on here? Well, all of the words on the original list are related to sleep — sleep adjectives, sleep sounds, sleep paraphernalia — and this leads to a false memory that “sleep” was on the list as well.
You can do the same thing for other words if you want — showing people a list of words like “sour”, “candy”, and “sugar” should lead to false memories of the word “sweet”. You can also read the list of words aloud instead of showing it on a screen for 30 seconds, you should get the same result either way.
Draw your own conclusions about what this tells us about memory, but the effect should be pretty easy to reproduce for yourself.
We don’t think all false memory findings in psychology bear out. We think some of them aren’t true, like the famous Loftus & Palmer (1974) study, which we think is probably bullshit. But we do think it’s clear that it’s easy to create false memories under the right circumstances, and you can do it in the classroom using the approach we describe above.
You can even use something like the inattentional blindness paradigms above to give people false memories about their political opinions. A little on the tricky side but you should also be able to replicate this one if you can get the magic trick right. And if this seems incredible, ridiculous, unbelievable — try it for yourself!
Oh yeah, and here’s that bicycle: