Three Angles on Erik Hoel’s Aristocratic Tutoring

Erik Hoel, concerned that we’re not getting our fair share of geniuses, suggests that aristocratic tutoring is what’s missing:

Let us call this past form aristocratic tutoring, to distinguish it from a tutor you meet in a coffeeshop to go over SAT math problems while the clock ticks down. It’s also different than “tiger parenting,” which is specifically focused around the resume padding that’s needed for kids to meet the impossible requirements for high-tier colleges. Aristocratic tutoring was not focused on measurables. Historically, it usually involved a paid adult tutor, who was an expert in the field, spending significant time with a young child or teenager, instructing them but also engaging them in discussions, often in a live-in capacity, fostering both knowledge but also engagement with intellectual subjects and fields.

“Aristocratic tutoring” is not how we would describe it, but otherwise this sounds about right. We think Erik is right that historical tutoring was better than education today. But we don’t think being aristocratic is what made it better. So here are three other angles on the same idea:


It’s no secret that school sux. It’s not that tutoring is good, it’s that mechanized schooling is really bad. If we got rid of formal 20th century K-12 education, and did homeschooling / unschooling / let kids work at the costco, we would get most of the benefits of tutoring without all the overhead and inequality.

Our personal educational philosophy is that, for the most part, the most important thing you can do for your students is expose them to things they wouldn’t have encountered otherwise. Sort of in the spirit of, you can lead a horse to water, but you can’t make him drink. So K-12 education gums up the works by making bad recommendations, having students spend a lot of time on mediocre stuff, and keeping them so busy they can’t follow up on the better recommendations from friends and family. 

From this perspective, mechanized schooling is actually a net negative — it is worse than nothing, and if we just let kids run around hitting each other with sticks or whatever, we would get more geniuses. 

Future Geniuses

But another possibility is that mechanized schooling is net neutral, and the problem is that we’ve lost some active ingredient that makes tutoring effective.


Education no longer includes moral instruction. Back in the day, a proper education taught you more than “the mitochondria is the powerhouse of the cell” — it taught you to take your character as seriously as your scholarship, to lead and to serve, and to understand your moral responsibilities. Tutoring worked because tutors inspired their pupils. Modern education is a lot of things, but “inspiring” ain’t one of them.

Back when formal education could still be inspiring, it still produced brilliant individuals. People have pointed out that the Manhattan Project was led by a group of strangely brilliant Hungarian scientists. Not only did most of them come from Budapest, many of them went to the same high school, and some of them had the same math teacher, László Rátz. Eugene Wigner, a Nobel Laureate in physics and one of Rátz’s pupils, had this to say:

… there were many superb teachers at the Lutheran gymnasium. But the greatest was my mathematics teacher László Rátz. Rátz was known not only throughout our gymnasium but also by the church and government hierarchy and among many of the teachers in the country schools. I still keep a photograph of Rátz in my workroom because he had every quality of a miraculous teacher: He loved teaching. He knew the subject and how to kindle interest in it. He imparted the very deepest understanding. Many gymnasium teachers had great skill, but no one could evoke the beauty of the subject like Rátz.

Rátz may or may not have been responsible for Wigner’s success, and he didn’t teach everyone involved in the Manhattan Project; our point is just that these Hungarians lived in a time when high school math teachers could still inspire former students to describe them as “miraculous”. This seems to be an aspect of the educational system that we have lost.

If this is right, then we don’t need to worry about tutoring being aristocratic. You shouldn’t need tutors or even miraculous Hungarian math teachers. Other things that are also inspiring / socially encouraging would work just as well — see for example the amazing progress of the speedrunning community, a bunch of teenage nerds bootstrapping a scene by inspiring each another to insane degrees of precision.

Erik hints at this by mentioning the social element. “For humans,” he says, “engagement is a social phenomenon; particularly for children, this requires interactions with adults who can not just give them individual attention, but also model for them what serious intellectual engagement looks like.” Individual attention is good, but we also think kids are good at teaching themselves. The active ingredient to us is showing kids “what serious intellectual engagement looks like”, and most kids today don’t see that until college (if ever).


The real problem is segregating children. Tutoring worked because you exposed children to people practicing a real skill (even if it’s only speaking their native language), or working in an actual profession. Modern education exposes them only to teachers.

At the end of your German tutelage you can speak to people you wouldn’t have been able to speak to before, read books and poems you wouldn’t have been able to read. At the end of your taxidermy tutelage you can take samples and stuff birds, and could theoretically make a living at it. Meanwhile at the end of high school you can write a five-point essay, a “skill” that you will never use again as long as you live.

So the problem is not the lack of tutoring per se, as much as the lack of giving children any sense of the real world at all. Today, children have to be sent to guidance counselors to be advised on what is out there. Teenagers dream of being youtubers and influencers. This isn’t their fault — these are some of the only professions where they actually understand what is involved. It’s the fault of adults, for not letting children see any of the many ways they could actually go out and exercise their powers in the world.

But tutoring isn’t the only way to expose children to real skills. So did working in the family business, and so did apprenticeships. Writing about why nerds are unpopular, Paul Graham says: 

I’m suspicious of this theory that thirteen-year-old kids are intrinsically messed up. If it’s physiological, it should be universal. Are Mongol nomads all nihilists at thirteen? I’ve read a lot of history, and I have not seen a single reference to this supposedly universal fact before the twentieth century. Teenage apprentices in the Renaissance seem to have been cheerful and eager. They got in fights and played tricks on one another of course (Michelangelo had his nose broken by a bully), but they weren’t crazy.

As far as I can tell, the concept of the hormone-crazed teenager is coeval with suburbia. I don’t think this is a coincidence. I think teenagers are driven crazy by the life they’re made to lead. Teenage apprentices in the Renaissance were working dogs. Teenagers now are neurotic lapdogs. Their craziness is the craziness of the idle everywhere.

Paul is right; in many parts of the world, useful apprenticeship was the historical norm. As anthropologist David Graeber writes:  

Feudal society was a vast system of service… the form of service that had the most important and pervasive influence on most people’s lives was not feudal service but what historical sociologists have called “life-cycle” service. Essentially, almost everyone was expected to spend roughly the first seven to fifteen years of his or her working life as a servant in someone else’s household. Most of us are familiar with how this worked itself out within craft guilds, where teenagers would first be assigned to master craftsmen as apprentices, and then become journeymen… In fact, the system was in no sense limited to artisans. Even peasants normally expected to spend their teenage years onward as “servants in husbandry” in another farm household, typically, that of someone just slightly better off. Service was expected equally of girls and boys (that’s what milkmaids were: daughters of peasants during their years of service), and was usually expected even of the elite. The most familiar example here would be pages, who were apprentice knights, but even noblewomen, unless they were at the very top of the hierarchy, were expected to spend their adolescence as ladies-in-waiting—that is, servants who would “wait upon” a married noblewoman of slightly higher rank, attending to her privy chamber, toilette, meals, and so forth, even as they were also “waiting” for such time as they, too, were in a position to marry and become the lady of an aristocratic household themselves.

Service was especially pervasive in England. “Few are born who are exempted from this fate,” wrote a Venetian visitor around 1500, “for everyone, however rich he may be, sends away his children into the houses of others, whilst he, in return, receives those of strangers into his own.”

Even just having your children around adults and being a part of adult conversations will go a long way. For what it’s worth, this is how we were raised, i.e. mostly around adults.

This may be another element common to the cases Erik mentions — most of the geniuses he names seem to have had very little contact with children outside their immediate family. Whether or not this is good for children psychologically is a separate question, but it does seem to lead to very skilled adults.

In fact, the number of children in a family might also be a factor. There was a time when most families were pretty large, so a lot of children had several older siblings. If you have five older brothers, you get both benefits — other children to play with, and a more direct line to adulthood through your older siblings. Erik mentions the example of Bertrand Russell, and we wonder if this might be more representative than he realizes:

When Bertrand Russell’s older brother introduced him to geometry at the age of 11, Russell later wrote in his autobiography that it was: “… one of the great events of my life, as dazzling as first love.” Is that really solely his innate genetic facility, or was mathematics colored by the love of his older brother?

It’s easy to come up with other examples (though of course this is not universal). Charles Darwin was the fifth of six children. The Polgár sisters are all chess prodigies, and were intentionally raised to be geniuses, but the youngest daughter Judit is the best of the three. Jane Austen had five older brothers and an older sister. Her eldest brother James wrote prologues and epilogues for plays the family staged and it seems as though this moved Jane to try her hand at something similar.

So part of the success of tutoring might simply be exposing a child to subjects “before they are ready”, and one way to reliably do that is to have them overhear the lessons of their older siblings, who they are ready to imitate.

This ties neatly into the social/moral element we mention above. Children may be moved by a passionate tutor, or a beloved uncle, or a cousin, or a medical student who lives in the spare room. But they will always be influenced by older siblings, and the more older siblings there are, the more gates to adult influence will be opened. Maybe if we want more geniuses, people need to start having larger families.

The Didactic Novel


James Clavell’s Shōgun is a historical novel about the English pilot John Blackthorne. The Dutch ship he’s piloting crashes in Japan in the year 1600, and Blackthorne has to learn how to survive in what to him is a mad and totally alien culture. 

All historical novels are somewhat educational, but Shōgun teaches you about more than just Japanese society at the beginning of the Tokugawa Shogunate. 

Blackthorne speaks a lot of different languages, and this is a big part of his identity. He speaks English natively and Dutch with his crew, but also Latin and Portuguese and even a little Spanish, which he uses to communicate with the few other Europeans he finds in Japan, mostly Catholic priests. This makes sense in the context of the novel — his ship is Dutch but their allies the English are the best pilots in the world, and they’re using stolen Portuguese documents to navigate strange waters, so he would need to speak that language too. 

So when Blackthorne finds himself stranded in Japan, he starts learning Japanese. At first this is hard because Blackthorne has only ever studied European languages before, and also because people keep trying to kill him. But he has a lot of experience learning foreign languages and little else to do, so he quickly starts picking it up.

What’s more surprising is that soon the reader is picking up some Japanese too. Linguistically, Clavell has put the reader in the very same situation as Blackthorne. The book starts out entirely in English, but suddenly you are confronted with words and phrases in a language you don’t understand. You end up learning many of these words and phrases just to follow along. 

Staged seppuku ritual, 1897

It seems like Clavell is doing this intentionally. The book is in English, but Blackthorne is the only English-speaking character in the novel. Except in the few cases where he’s talking to himself, all the dialogue is actually being carried on in other languages, but when the dialogue is in Dutch, or Portuguese, or even Latin, Clavell renders it all as English. When Japanese people are speaking Japanese to each other, he translates that into English too. But when Blackthorne encounters Japanese that he doesn’t understand, or just barely understands, it’s usually rendered as romanized Japanese. To follow these snippets you need to learn a little Japanese, so you do. And the interesting thing is, you learn this little bit of Japanese without any conscious effort.

It’s hard to read Shōgun all the way through and not learn at least a few words in Japanese. By the end of the first volume, most readers will know words like onna, kinjiru, wakarimasu, hai, ima, ikimasho, anjin, domo, isogi, and of course the omnipresent neh.

This isn’t a perfect language-learning tool. Shōgun is over 300,000 words long (and the original draft was considerably longer), but most of that is devoted to being a historical novel, an adventure story, and a romance, not teaching you Japanese. We love that there are lots of reasons to read it. But given the limited amount of space devoted to these basic Japanese lessons, it’s a very effective introduction.


Cryptonomicon by Neal Stephenson is a dense novel that alternates between historical fiction and near-future sci-fi. 

There are two storylines. The first is set during World War II, and follows a group of characters pioneering cryptography in an effort to win the war, and inventing the computer — among the characters are a fictionalized version of Alan Turing and his even-more-fictional German boyfriend, Rudolf “Rudy” von Hacklheber. 

The second storyline focuses on the grandchildren of some of the WWII characters in the modern day, several of whom are putting together a startup in southeast Asia in an attempt to create an anonymous banking system using magic internet money. The novel was published in 1999 so yes, this seemed like an ambitiously futuristic scheme at the time. It also maybe helped create that future — Cryptonomicon was required reading during the early days of PayPal.

Unironically the best ad ever created

But implicitly, and at times explicitly, Cryptonomicon is a textbook on something like information theory. Chapter One includes a long discussion where Alan Turing and Rudy von Hacklheber teach Lawrence Pritchard Waterhouse (sort of the viewpoint character) about Russell and Whitehead, Gödel, the distinctions between mathematics and physics, how logic can be reduced to symbols, etc. If this sounds dry, it isn’t — you’ll probably learn more about philosophy of math in these 4000 words than you did during 4 years of college. Then Alan and Rudy give Lawrence a problem to go off and solve so the two of them can fuck. Sex comes up a lot in Cryptonomicon, possibly because sex itself is about the exchange of deeply encrypted source code, or possibly because Stephenson is just horny.

All that just in Chapter One. This is a book about cryptography, and so pretty much every other chapter has some lesson, implicit or explicit, about topics like symbols, languages, systems, inference, even actual algorithms or code snippets. Chapter 25 ends by walking you through the process of doing encryption and decryption with a one-time pad. There’s even information theory disguised (?) as small-business advice. It’s kind of Gödel, Escher, Bach in novel format, to the point that there are references to GEB hidden in a few places around the book. 

For the most part these lessons are subtle and deeply embedded:

One night, Benjamin received a message and spent some time deciphering it. He announced the news to Shaftoe: “The Germans know we’re here.”

“What do you mean, they know we’re here?”

“They know that for at least six months we have had an observation post overlooking the Bay of Naples,” Benjamin said.

“We’ve been here less than two weeks.”

’’They’re going to begin searching this area tomorrow.”

“Well, then let’s get the fuck out of here,” Shaftoe said.

“Colonel Chattan orders you to wait,” Benjamin said, “until you know that the Germans know that we are here.”

“But I do know that the Germans know that we are here,” Shaftoe said, “you just told me.”

“No, no no no no,” Benjamin said, “wait until you would know that the Germans knew even if you didn’t know from being told by Colonel Chattan over the radio.”

“Are you fucking with me?”

“Orders,” Benjamin said, and handed Shaftoe the deciphered message as proof.

But in a few places he does come out and state the idea plainly:

It all comes to him, explosively, during the Battle of Midway, while he and his comrades are spending twenty-four hours a day down among those ETC machines, decrypting Yamamoto’s messages, telling Nimitz exactly where to find the Nip fleet.

What are the chances of Nimitz finding that fleet by accident? That’s what Yamamoto must be asking himself.

It is all a question (oddly enough!) of information theory.

If the action is one that could never have happened unless the Americans were breaking Indigo, then it will constitute proof, to the Nipponese, that the Americans have broken it. The existence of the source—the machine that Commander Schoen built—will be revealed.

Waterhouse trusts that no Americans will be that stupid. But what if it isn’t that clear-cut? What if the action is one that would merely be really improbable unless the Americans were breaking the code? What if the Americans, in the long run, are just too damn lucky?

And how closely can you play that game? A pair of loaded dice that comes up sevens every time is detected in a few throws. A pair that comes up sevens only one percent more frequently than a straight pair is harder to detect—you have to throw the dice many more times in order for your opponent to prove anything.

If the Nips keep getting ambushed—if they keep finding their own ambushes spoiled—if their merchant ships happen to cross paths with American subs more often than pure probability would suggest—how long until they figure it out?

The whole book is backwards and out-of-order — not only because the chapters set in 1942 are intermixed with the chapters set in 1997, but because internal storylines are intentionally disjointed. Effects come before causes, explanations come many chapters before or after the thing they are meant to explain, critical hints are brief and easily missed. But this is intentional. The whole book is a giant combination lock, the final exercise left for the reader, and deciphering it is part of the reading experience and part of the lesson.

In any case, it’s hard to read Cryptonomicon all the way through and not learn something about information theory. You won’t be an expert, but it’s a damn fine introductory textbook. And because Stephenson is such a master, the book is designed to give up more mysteries every time you re-read it. Each time you revisit, you’re struck with stuff you missed the last time around. 

Writing novels that are secretly textbooks kind of seems to be Stephenson’s MO. Cryptonomicon has a prequel series called The Baroque Cycle. Just like Cryptonomicon deals with the invention of computing and information theory, these books deal with the invention of the scientific method, following historical characters like Sir Isaac Newton and Gottfried Wilhelm (von) Leibniz. It’s also about the invention of banking/modern currency, and it’s heavily implied that the two are connected — a true historical fact is that in addition to his work in physics, Isaac Newton was the Master of the Mint, in charge of all English currency, for thirty years. He even went out to taverns in disguise to personally catch counterfeiters. 

The perfect disguise

Stephenson also seems to be aware that this is what he’s doing. Maybe this is not surprising given his other novel The Diamond Age, a book about a book that teaches you things. The Diamond Age follows a similar model and tries to implicitly teach the reader about the basics of computer science and macroeconomics.

Harry Potter and the Methods of Rationality

Harry Potter and the Methods of Rationality (HPMoR) is a 660,000-word Harry Potter fanfic by Eliezer Yudkowsky.

Explicitly, HPMoR asks the question: what if Harry Potter were raised by an Oxford professor and was intensively homeschooled, instead of being raised in a closet by the Dursleys? Also explicitly, HPMoR is Yudkowsky’s attempt to teach the scientific method and “the methods of rationality” to a general audience.

Clavell and Stephenson seem somewhat aware that their novels are educational, but Yudkowsky is the only one of the three who comes right out and talks about how this is his goal, at least that we’ve seen. In a post on why he wrote the fanfic, he says:

But to answer your question, nonfiction writing conveys facts; fiction writing conveys *experiences*. I’m worried that my previous two years of nonfiction blogging haven’t produced nearly enough transfer of real cognitive skills. The hope is that writing about the inner experience of someone trying to be rational will convey things that I can’t easily convey with nonfiction blog posts.

Yudkowsky is referring to his other attempt to teach these skills as “The Sequences” on LessWrong. Elsewhere he says that these two attempts, fiction and nonfiction, don’t even communicate the same thought. But to editorialize a bit, it seems like HPMoR was more successful than the Sequences. It’s certainly reached a broad audience — among other things, it’s been reviewed in venues like Vice, Who Magazine, and The Hindustan Times.

(To editorialize a bit more, Yudkowsky’s writing on writing might be more interesting than either the Sequences or HPMoR. But of course we’re very interested in writing so we’re kind of biased.)

Yudkowsky describes his goal as teaching “real cognitive skills”, and he’s on the money with this one. Many skills are better taught through experience than presented as a block of facts — you’ll learn more Japanese from getting lost in Tokyo than you will from skimming a Japanese grammar. So for skills like these, a didactic novel is better than an explicit textbook, or at least a good complement.

HPMoR is spread a little thin — unlike Japanese or information theory, “rationality” is not really a single subject, so it’s a little less cohesive. But Yudkowsky does still have a lot of specific points he’s trying to make, and it’s hard to read HPMoR all the way through and not learn something about genetics, psychology, heuristics, game theory, tactics, and the scientific method.

The Didactic Novel

All three of these novels were extremely successful. All of them try to teach you something more concrete than the average novel tries to teach you. And all of them are at least somewhat successful.

Some skills, like oil painting or bicycle repair, are hard to learn from just reading about them — you actually have to go out and try it for yourself. But in many skills, the basics can be picked up vicariously. You won’t be a great codebreaker after reading Cryptonomicon, but it gives you a very firm foundation to start from.

Novels are powerful teaching tools because they’re more fun than textbooks, and fun is good. Educational and entertaining are treated like foils, but they’re actually complimentary. If something is entertaining, it holds your attention; if it holds your attention, you will be able to engage; if you engage you can learn something. If something is boring or tedious you will go look at twitter or pick your nose instead. Shōgun doesn’t teach you quite as much Japanese as you would get from a Japanese 101 course at the local university, but we guarantee it’s twice as fun and two hundred times easier to read Shōgun than it is to take all those quizzes. Japanese for Busy People is a pretty good textbook, but you don’t want to cuddle in with it on a snowy afternoon.

And frankly, fun sticks in your brain easier. 

Fiction is great. It engages. It inspires. Fiction led thousands of people to develop an intricate understanding of the history and politics of Westeros, including hundreds of characters and thousands of events and relationships. It led people to create detailed models of fictional castles in SketchUp. Fiction inspires people to scholarly discourse on the details of medieval sieges, or painstaking minecraft replicas of entire continents. Fiction leads people to totally overthink why an empire might destroy a province in a show of military might, or speculate in-depth about the project management that it would require. And yes, the power of fiction led to millions of words worth of Harry Potter fanfic from literally thousands of authors. Imagine if we harnessed even a little of that power.

Do you have strong opinions about which of these people you would invite to your birthday party? Which of them you would have an ale with? Which of them you would let look after your child? You do? FICTION

Language Learning

We think there should be lots more didactic novels — novels that try to teach you something concrete, like a skill. And we actually think that James Clavell got it right with Shōgun, that the best subject for a didactic novel is language learning. 

Shōgun is distracted by having many other priorities, but a novel that put language-learning first could be an engine of unimaginable education. Much like Clavell, you would start the story entirely in English, and introduce words in the new language one by one. Eventually you would start introducing basic grammar. The bits in the target language would start out on the level of “see spot run”, but would gradually become as complicated as the sections in English. As you move through the novel, the text would transition slowly from all-English to all-target-language. By the end, you would just be reading a novel in Swedish or Arabic or Cantonese or whatever.

This transition would have to be very slow for this to work, so the novel would have to be really long. But if you do it slowly enough, it won’t feel difficult for the reader at any point.

You might be worried that people won’t be willing to read such a massive story, but we don’t think it’s a problem. People already spend a lot of time on language-learning apps. Language-learning is a big market, and people are plenty happy to invest their time and money. As just one example, Duolingo is now worth more than $6 billion. And Duolingo isn’t even that great — it’s kind of bad. 

And while there’s a stereotype that people don’t like to read, or don’t like long books, the rumors of the death of our attention spans are greatly exaggerated. Shōgun itself is on Wikipedia’s list of the longest novels of all time, at over 300,000 words, and it sold six million copies in the first five years of publication. Jonathan Strange & Mr Norrell by Susanna Clarke, also about 300,000 words, was a smash hit and won a slate of awards. The entire Lord of the Rings series (minus The Hobbit), is about 500,000 words. Infinite Jest is about 550,000 words, all of them dense.

The entire Harry Potter series is more than 1,000,000 words long, and millions of pre-teens have wolfed it down without stopping for breath. If a school story with magic wands could inspire kids to do that, imagine how they would respond to a book that actually teaches them German, or any other language their parents don’t understand. Half the fun of any YA series is all the weird shibboleths you develop that adults can’t pierce. On this note, the web epic Homestuck was arguably even longer, and captured the minds of a generation, for good or for ill.

You really can engage 13-year-olds with 1,000,000+ words of arcane bullshit

Game of Thrones, the first book alone, is about 300,000 words long, and the whole A Song of Ice and Fire series is about 1,700,000 words so far. While most people have not read all the books, you can’t deny their impact. And it’s not like the sales have been lackluster or something, Martin is one of the highest-earning authors in the world.

You could make a pretty good case that Dune, almost 200,000 words long and with five sequels, is already a didactic novel about ecology, or maybe political science, or maybe the intersection of ecology and political science. I’m at the ecology. I’m at the political science. I’m at the intersection of ecology and political science. 

A Case Study

Since we think Clavell has done the best job so far, it’s worth taking a bit of a look at how he does it.

(Minor spoilers for Shōgun from here on.)

The prologue has no Japanese at all, since it’s set on a Dutch ship in immediate danger of going down with all hands. But in Chapter 1, things are immediately different. Blackthorne wakes up in a strange room. A woman comes in and says something to him in Japanese — “Goshujinsama, gokibun wa ikaga desu ka?” It’s the very first page, and already we get a full sentence in Japanese.

A few pages later, we learn our first word. Blackthorne points at the woman to ask her her name. She says, “Onna”. But this is a misunderstanding — “onna” is just the Japanese word for “woman”. This will come back to get Blackthorne in the ass, but not for a while.

A few pages later we learn the words “daimyo” (a type of Japanese noble) and “samurai” when Blackthorne talks to one of the local Catholic priests, who challenges him in Portuguese.

Then a samurai appears and says, “Nanigoto da,” a phrase we don’t understand, three times. Then we get our second full sentence. The samurai, whose name is Omi, asks Blackthorne, “Onushi ittai doko kara kitanoda? Doko no kuni no monoda?” which the Portuguese priest translates as ‘Where do you come from and what’s your nationality?’” He also explains that the Japanese use the suffix “-san” after a name as an honorific, like we use “Mr.” or “Dr.” before ours, so he should call the samurai Omi-san.

Clavell doesn’t give us the rest of the conversation in Japanese, but at the end Omi asks him, “Wakarimasu ka?” which the priest translates as “Do you understand?” Blackthorne is already itching to learn the language for himself, and asks how to say “yes” in Japanese. The priest tells him to say, “wakarimasu,” which is sort of correct. He also sees Omi behead a man and shout “Ikinasai!” twice. Most of what we hear at this point isn’t translated, but we’re already getting exposed to a lot of Japanese.

From the 1980 miniseries

Blackthorne talks to a few more samurai on his ship, and hears the phrases “Hotté oké!”, “Nan no yoda?”, and “Wakarimasen”, which astute readers might already notice is similar to “Wakarimasu ka?” and “wakarimasu” from before. When he uses signs to ask to go to his cabin, they say, “Ah, so desu! Kinjiru.” Based on how they threaten him when he tries to go inside, he correctly infers that “Kinjiru” means “forbidden”.

After spending a lot of time with his crew, he goes back to the house he woke up in. He hears “konbanwa” from the gardener, and while it’s not defined, context makes it clear that this is a greeting — in fact, it’s Japanese for “good evening”. 

Then he asks to see “Onna” and the joke set up at the start of the chapter comes full circle. He hears “hai” and “ikimasho” and “nanda”, not understanding, and then one of the women tries to get into bed with him, until the village headman, who speaks a little Portuguese, explains that “onna” means “woman”. We also see our first “neh”s.

And that’s all the Japanese in Chapter One. Blackthorne is taught the words onna, daimyo, and samurai, and is taught to use the suffix –san. He is sort of taught the word wakarimasu, and he correctly infers the meaning of kinjiru. He — along with the reader — is also exposed to several words that are not yet defined explicitly, and a few complete phrases, some of which get approximate translations. 

In Chapter 2, and forever onwards, daimyo and samurai are used as normal vocab, since these terms don’t have equivalents in English, and we see the suffix -san where appropriate. We also see one other full sentence in Japanese — “Ano mono wa nani o moshité oru?”, which isn’t translated — but that’s it. 

In Chapter 3, we learn the suffix -sama, meaning “lord”. We also learn that ronin are “landless or masterless peasant-soldiers or samurai.” But this chapter is also short, and we barely see Blackthorne at all, so both of these translations are provided by the narration.

In Chapter 4, we hear the word “isogi”, which is translated as “hurry up!” Then we hear it again. We also see “kinjiru” twice, with only the reminder that it’s “the word from the ship”, but context and the hint help recall the meaning. 

In Chapter 5, Blackthorne starts using Japanese himself, saying “kinjiru” twice to talk to a samurai.

In Chapter 6, the local priest tells him that the Japanese word for “yes” is “hai”. Blackthorne uses the word four times. We see the phrase, “wakarimasu ka” twice, which the priest translates the first time, but not the second time. We encounter the word “okiro” for the first time, translated as “you will get up.” We also learn the word “anjin”, which means “pilot”, when Omi tells Blackthorne that the Japanese can’t pronounce his name and will call him “Mr. Pilot”, or “Anjin-san”.

In Chapter 7, we learn the phrase “konnichi wa”, which they translate as “good day”. Blackthorne then uses the phrase six times to greet people, and we hear it once from someone else. We see the word “Anjin” at least a dozen times — Clavell wants us to get used to it, because it’s Blackthorne’s new name. We see “hai” twice, and “wakarimasu” and “wakarimasu ka” and “isogi” and “kinjiru” once each. 

During this chapter, Blackthorne also meets a Portuguese pilot (Rodrigues), who tells him that “ima” means “now”, and also uses the term “ikimasho”, a term we saw once in Chapter 1, but doesn’t define it. He also uses the term “ichi ban”, which he doesn’t explain, and throws around a bunch of “wakarimasu ka”, “kinjiru”, and “sama”. When he argues with some samurai, they say “gomen nasai”, which is translated as “so sorry”, and “iyé”, which isn’t translated but clearly means “no”. 

In Chapter 8, Blackthorne and the Portuguese pilot Rodrigues use “wakarimasu ka” and “hai” with one another, just as part of normal conversation. Blackthorne hears him use “isogi” again, asks what it means, and Rodrigues tells him it means “hurry up”. Blackthorne uses the word not long after when he takes control of the ship in a storm. We see “wakarimasu” twice and “hai” four times. We see a new term, “arigato goziemashita” (not the common spelling), which isn’t defined but is clearly in the context of someone thanking him. We also see “iyé” again, in a context where it clearly means “no”, confirming its meaning.

In Chapter 9, we see “hai” twice, and “isogi” once. We also see “iyé”, and again Clavell refuses to define it explicitly. But by now, the reader has seen it three times in contexts that all clearly mean “no”, and is probably starting to pick up on that. 

In Chapter 10, we see “konnichi wa”, “isogi”, and “wakarimasu ka” once each, and “hai” five times. None of them are translated, and the chapter doesn’t miss a beat. These are all just normal vocabulary in the novel at this point, the reader is expected to know what they mean. 

At this point the novel takes a break from language education to spend a few chapters mostly focusing on plot, so we’ll stop here too. But already, you can see the pattern. 

Clavell mixes it up a lot, but the general formula goes like this:

  1. The first time you encounter a word, it isn’t defined and no one explains what it means, but there are often context clues.
  2. Soon after that, the word is used again and someone either tells you what it means, or Blackthorne guesses. 
  3. The next time you see the word, you get a little reminder either of the definition, or of the last time you saw the word.
  4. After a few more uses with clear context, the word becomes part of the general vocabulary. From then on, you are expected to know what it means!

This is essentially how you learn words as a child, or how you would learn Japanese if you had to use it as part of your daily life. The first time you hear a word, you have no idea what it means. Eventually someone tells you what it means or it becomes clear from context. The next time you see or hear the word, you might need a reminder. But once you’ve used it a bit, it gets locked in. 


Let’s look at some examples. The word “hai” means “yes”. You hear it first in Chapter 1, with a little context that suggests what it might mean. We don’t see it again until Chapter 6, when the local priest tells us what it means. It’s then used a couple of times in Chapter 7. In Chapters 8-10, it’s just a normal word, fully integrated into the story, with no further reminders. 

The word “kinjiru” means “forbidden”. Blackthorne hears it first in Chapter 1, and guesses what it means from context. We see it again in Chapter 4 with a simple reminder (just “the word from the ship”), and Blackthorne uses it in Chapter 5, where context makes it clear what it means. From then on, it’s in the vocab.

We first encounter the word “isogi” in Chapter 4, where the narrator translates it for the reader as “Hurry up!” But Blackthorne doesn’t get the benefit of this translation. When it reappears in Chapter 7, he still doesn’t know what it means. It comes back in Chapter 8, Blackthorne asks what it means, and Rodrigues tells him. Later that chapter, Blackthorne is using the word himself. It’s the same principles, just slightly mixed up.

The approach Clavell is using is called spaced repetition, a memory technique that works by introducing new content and then bringing it back after a bit of a delay. This works because of something called the forgetting curve. When you’ve just learned something, it’s strong in your memory, but that trace gets weaker and weaker over time. If you’re asked to remember the thing right away, it’s still fresh in your mind and takes no effort — but if you wait too long, you’ve forgotten entirely. So the thing to do is wait until the memory has decayed just a bit, and then bring it back. This stresses the memory and reinforces it, sort of like how stressing a muscle builds strength.

Clavell is taking advantage of the fact that most people will not chug this 300,000-word novel in one sitting — most people will read it a few chapters at a time. This gives them time to partially forget many of these words between chapters, so that when they return to the book in a day or two and the words come up again, they are jostled out of memory, and the meaning of the word is reinforced. 

(Stephenson uses the same approach as a storytelling technique. Something called “Van Eck phreaking” is an important plot point near the end of Cryptonomicon, so Stephenson makes sure that it’s explained before it becomes important, and that it comes up a few times before it’s explained.)

This is how you should write your didactic novel too. Start with a character who doesn’t know the language at all, who is in the same position as the reader. Words and concepts are introduced in the background first, without any explanation. After the reader has seen the word a few times, a character comes out and tells the reader what it means, or else they guess what it means, or it’s used in a context that makes the meaning clear. Shortly afterwards, the word is used again, either in a context that helps reinforce the meaning, or with a gentle reminder. 

Use the word a few more times in situations where context helps make the meaning clear. After that, add the word to your “approved vocabulary” list, and use it wherever it’s appropriate in the novel — the reader is now expected to know what it means. If you teach people a couple words each chapter, you can outstrip the average language 101 class in a decent-length novel.

All you need to do is go harder than Clavell, and make language-learning your secondary focus. We say secondary and not primary because your primary focus is to make sure it’s an enjoyable read. The book won’t teach anything if no one gets through it!

Naturally, you can use all the same techniques if you’re writing a didactic novel about calculus or music theory. All the same ideas still apply — language learning just offers an exceptionally clear-cut example. 

A Narrative Addition

Clavell’s technique is similar to the hero’s journey. This is a template for writing and describing stories, where a person starts out in their comfort zone, is forced out by circumstance, confronts trials, gains knowledge, and returns to their comfort zone, but stronger than they were before.

Clavell doesn’t exactly use this technique, but you could easily combine the hero’s journey with his approach.

The hero’s journey can be as epic as a series of fantasy novels, or as unassuming as a man changing a tire in the rain:

Fade in on a meek-looking man driving a car. It’s raining. Boom. Flat tire. He struggles to keep the car from ditching. He pulls it to the side of the road and stops. He’s got fear on his face. He looks out his car window at the pounding rain… It doesn’t matter how small or large the scope of your story is, what matters is the amount of contrast between these worlds. In our story about the man changing his tire in the rain, up until now, he wasn’t changing a tire. He was inside a dry car. Now, he opens his car door and steps into the pouring rain. … Our stranded, rain soaked driver has finished emptying the contents of his trunk on the side of the road. He sees the spare tire and he lets out a very slight, very fast sound of relief. That’s all. This is a story about a man changing a tire. … When you realize that something is important, really important, to the point where it’s more important than YOU, you gain full control over your destiny. … You have become that which makes things happen. You have become a living God. Depending on the scope of your story, a “living God” might be a guy that can finish changing a tire in the rain. 

This is such an engrossing story format because it mirrors the process of self-improvement in the real world, which the reader can enjoy vicariously. You learn something unfamiliar, use it, and master it. But in the didactic novel, we can put the reader in nearly the same situation as the character, and have them go through the journey together.

This approach would work well with genres like adventure novels, police procedurals, sitcoms, detective dramas, or Monster of the Week shows, which lend themselves well to stories with explicit cycles. Anything super-pulpy should fit the bill, anything episodic or serialized. 

The American spy stranded in Russia needs to get home, but to survive for the moment, he needs to learn some Russian. He finds an old run-down garage where two old farts, who speak a little English, let him hide out. Each cycle goes like this: During the intro, Spy encounters some Russian that he doesn’t know, on the radio or in the newspaper or something. This is foreshadowing, phrases that will come up later in the cycle, and this is just to embed them in the reader’s subconscious. Then he has a conversation with one of the old guys, who tells him some vocabulary or explains some part of Russian grammar to him. 

After this, the spy goes out on a mission or a job or something — get some supplies, meet a contact, follow up on a lead, normal spy shit. During the climax he is in a real pinch, but he remembers the words the old guy taught him that morning, and he manages to fix things. He uses those words a few more times to really embed them in the reader’s mind, and then he goes back to his hideout. The words he learned today go in the vocab box, and the author will use them freely from now on, maybe making sure to give them a guest appearance next episode so they stay in the reader’s memory.

For obvious reasons, novels that want to teach a language will have an easier time if the novel is set in the past, because there were more places you could go where you’d have to learn the language to get by. For similar reasons, setting your story in a time before cell phones and the internet will generally help a didactic novel on any subject, since it lets you isolate your characters from textbooks and dictionaries. Post-apocalyptic, fantasy, and far-future settings would also work.

So if you decide to write a didactic novel (or other didactic fiction), give us a holler.

Cheating at School is a Better Idea Than Ever

With absolutely no apologies to The Wall Street Journal.

A year of absolute, unprecedented bullshit has spurred an eruption of cheating among students, from grade school to college. With many students isolated at home over the past year—and with the pointless grind of school revealed for what it truly is—academic dishonesty has never been such an obviously reasonable choice.

Some pedants fear the new generation of cheaters will be loath to stop even after the pandemic recedes. “Students have finally found a way to avoid my bullshit, and worse, they know it works,” said Phineas Whateley, senior teaching fellow in soup calculation at Royal University College in London, who has studied academic integrity issues for more than two decades, though apparently without learning anything of value. He said cheating sites number in the thousands, from individuals to large-scale operations.

Concerned about his West Carolina State University students cheating in a statistics class, Richard Penistone launched a plan.

Rather than writing a more reasonable exam, or spending time helping students master the material, Mr. Penistone, a course coordinator, wasted countless hours writing a computer program that generated a unique set of questions for each student. Those questions quickly showed up on a for-profit homework website that helped him to identify who posted them. 

About 200 students were caught cheating—one-fourth of the class. Yet somehow Mr. Penistone was more concerned about punishing these 19-year-olds than he was that he had created a class that 25% of his students decided was such bullshit that they couldn’t be arsed to even attempt the final exam. 

We note that Mr. Penistone is a course coordinator, not faculty. We assume he’s not tenured; he doesn’t even have an advanced degree. What is his deep-seated loyalty to this West Carolina State University, exactly? Do they pay him so especially well, that he is roused to stay up late writing code to generate and distribute 800 totally unique exams?

Overall, cases of academic dishonesty more than doubled in the 2019-20 academic year at WC State, with the biggest uptick as students were forced into the absurdity that is the Zoom classroom, according to the school.

Educators say stress and pressure, possibly related to the global pandemic maybe???, are a big reason why students cheat. “Especially in a time of stress, they realize that there are more important things than the rote memorization and regurgitation we force them to do on exams,” said Myra Capwell, president of the International Global Center for Academic Honor, Security, and Integrity, and director of the Kansas-Nebraska-Indiana Interstate University Dignified Honor and Integrity System.

Lucien Hoyt, an 18-year-old freshman at Mamimi University in Cambridge, Ohio, said he knows students who have used homework help sites for studying—and (brace yourself, dear reader) for cheating. He said he hasn’t cheated himself, but then again, he knows we’re narcs, so he would say that. 

He said students, including himself, are frustrated with virtual learning because it throws into stark relief how artificial these courses are, and how none of it matters. “I haven’t struggled this way with learning material, ever,” he said. “In the classroom I had the vague sense I might actually be getting an education. But with the trappings stripped away, it just becomes so clear that what they’re asking us to do is total busywork.”

At the K-12 level, schools are free to indulge their whimsy to become miniature police states, and many block a range of homework help websites from district computers to prevent cheating. Ultimately even this exercise in authoritarianism is pointless, however, since this doesn’t stop a student from visiting the site from a different device. 

Middle-school teacher Aurora Zimmer in Lake, Califorina, has put less emphasis on testing during online learning because it is also dawning on her, if somewhat slowly, that this is an exercise in futility. “We have no control of what is going on when you’re on a computer,” she said. “We can’t even force you to ask us to go to the bathroom. It really makes you think.”

Measures taken in the name of online cheating have spawned a new kind of comforting-and-not-at-all-draconian industry: surveillance-type companies that hire online randos to actually watch students take tests from home. I don’t know about you, but I find the idea of a faceless company hiring an online stranger to watch my 19-year-old child take a test in their bedroom very reassuring.

The internet strangers hired by these companies look for suspicious behavior (this is good because they are presumably experts in suspicious behavior), such as a student disappearing from camera view (going to the bathroom) or being slipped answers (eating chips). Some use “facial-detection” “software” to automatically penalize students who glance, however briefly, out of frame, or make “unusual movements”. This allows universities to not only be pedantic at heretofore unimaginable speeds, it allows them to outsource it as well.

Proctorio, based in Scottsdale, Arizona, said it monitored 21 million exams in 2020 world-wide, up from 6 million exams in 2019.

ProctorU, based in Hoover, Alaska, notes worrying displays of basic desires for respect, freedom, and privacy. “Some of these students must have accidentally been paying attention in their American History classes,” says one ProctorU drone with a sneer, “But we have a leg up on King George III. These latter-day George Washingtons and Patrick Henrys don’t stand a chance.” 

Here are some “funny stories” about cheating to make it sound amusing and disguise the human cost and egregious civil rights violations inherent to this kind of in-home surveillance: Some of the busts include a student suspected of trying to use a drone’s camera to take images of a test to possibly share with others; another who was trying to cheat by using information on sticky notes on his dog; and a female student who sneezed and disappeared from view, to suddenly be replaced by a male wearing a blond wig, impersonating her. Dogs and crossdressing! Isn’t that funny? Now go back to bed, America. 

Among the newer ways to cheat are homework auction sites, which give students a say in who does their work and at what price. Students post their assignment on a website, along with a deadline; the website acts as a marketplace for bidders who offer to do the assignment.

The bidders, who often refer to themselves as tutors, can tout degrees and other credentials. Some companies allow students to rate their work and post reviews online.

Stella Walker, a blogger and content strategist for, a site where students can auction out writing assignments, said the site’s terms prohibit academic fraud and plagiarism. She said she supposes cheating can happen, but it would be on a student’s conscience. “I know you’re a snitch, bonehead,” she told us.

One self-described independent tutor listed as Seymour Butz in a Craigslist ad said in an interview by text message that business was booming during the pandemic. The Craigslist ad noted services such as doing students’ math work. The tutor disavowed the label of a cheater for students, and said that the tutor helps students learn by providing written tutorials and explanations for math problems.

“No way would any student use my cheating service to avoid doing their work,” said Mr. Butz. “Boy do I ever disavow that label, you can put THAT in your article.”

Mr. Butz touted bachelor’s and master’s degrees from Pinto University in the ad, but the university said it was unable to locate such information in its records. We at The Wall Street Journal are beginning to suspect that “Seymour Butz” may be an alias

Other popular websites that students use to get help—by submitting a question for an expert to quickly answer, or by searching a database of previous answers—include Chegg and Brainly, which said they have seen a big increase in users during the pandemic.

Chegg, a publicly held company based in Santa Clara, Calif., prides itself on a willingness to be a big squealer, and help institutions determine the identities of those who cheat. “We really like to play both sides. It gives us a deep, almost visceral pleasure, to serve as a sort of giant honeypot sting operation for entrapping helpless students,” they told us by greeting card, despite the fact that we specifically did not ask them for comment, and don’t know how they found our home address. On the basis of such scummy practices, Chegg saw total net revenue of $644.3 million in 2020, a 57% increase year over year. Subscribers hit a record 6.6 million, up 67%, and students are charged between $9.95 and $19.95 per month for the privilege of letting Chegg stab them in the back.

Mr. Penistone at WC State said Chegg helped identify the 200 students that used its website to avoid taking his exhausting final exam. Some students posted exam questions to get answers while others accessed the information, all traceable through users’ email addresses, IP addresses and the time of the access.

Another website that students were suspected of using to cheat on the exam to a lesser extent showed actual moral fiber and didn’t cooperate with the university, Mr. Penistone said bitterly.

The students were given three options: meekly accept their punishment, join Mr. Penistone in what we can only imagine must be an excruciatingly awkward Zoom call to “review the evidence”, or dispute the accusation with the Office of Student Conduct. This office designed and staffed by the university and tasked with enforcing its rules is certain to give them all a fair hearing, we are sure.

“A lot of the students responsible said, ‘It’s unfair to put us through this, because we’re going through a pandemic,’ ” Mr. Penistone said. “Fortunately these complaints fell on deaf ears. I had no choice because there was a zero-tolerance policy. I mean, I’m the one who designed the class, and the exams, and the zero-tolerance policy. But really, I had no choice.”

Even after the bust, the cheating didn’t stop. This is unsurprising, because the issue is not cheating, but unrealistic expectations in texts and exams. A close analogy might be, “Even after the floggings, the attempts to mutiny didn’t stop.” I wonder why.

“In the fall semester, of 1,000 students, I still had attitude problems academic integrity issues with 70 or 80,” he said. “I still don’t understand the basic issues at play here — probably they have just gotten better at cheating, but fortunately I am blissfully unaware of all things happening in my classroom.”

The real tragedy of course is how this all contributes to greater societal alienation — that cheating is now being outsourced to faceless corporations, rather than being a way to build community with fellow classmates. What kind of America are we building for our children?

Hindsight is Stats 2020, Part III: Final-First Exams

[This is Part III of a retrospective on teaching statistics over summer 2020. Part I and Part II.]

Exams were my white whale for this course.

My design goals were clear. Someone who knows their stuff should be able to prove what they know and walk out of the class. Students should be encouraged to learn as fast as they can, and they should be rewarded for getting ahead of the class if they want to. And there should be almost no consequences for failure, so that students can experiment without torpedoing their grade.

But exams are famously plagued with problems. Rescheduling exams for students who are sick or have to miss a day. Deciding who gets to do make-up exams. The endless questions about exam format — “professor, will this be on the final?” Somehow, we complain about all this but take it for granted. Why not come up with a way to make these problems a thing of the past?

1. Final-First Exams

These days, professors have gotten more comfortable experimenting with exam formats. Lots of exams are open notes, open book, or even take-home. Some classes let you drop your lowest exam score. I’ve even heard of professors giving five exams and dropping your worst two.

Dropping tests is cool, because it fixes some of the classic problems. Have to miss an exam? No problem, just drop that one. No need for make-up exams. If you bomb an exam, just drop it.

This is the right direction, but we can do better. What else can we tinker with, to make exams even better?

I thought back to the cumulative format, and why it doesn’t work for teaching. Why have cumulative exams, then? Doesn’t it just serve to obscure your expectations? My class format was fractal, so that students could see what’s coming, know what’s expected of them. Why not use this approach with exams, too?

Dropping one exam isn’t cool. You know what’s cool? Dropping ALL the exams.

I call the format Final-First, because your first exam is a final exam. In fact, every exam is a final exam, meaning every exam covers all of the material covered in the whole course. The exams have nearly identical formats, differing only in the particulars. I swap out the numbers and some of the details on the questions, but once you’ve seen one final, you have a pretty good sense of all of them.

This course was six weeks long, and I gave them a final exam at the end of every week. This means they had a final exam at the end of Week 1, at the end of Week 2, at the end of Week 3, and so on…

Since these were all final exams, I didn’t expect most of them would do very well on the first exam. But that’s ok, because we dropped all their exam scores except for the best one. The exam grade, as it contributed to their grade for the class as a whole, was entirely based on their best exam. Other exam grades didn’t contribute at all.

If a student gets a 90% on the third final, it doesn’t matter how they did on the first two. Why should a student suffer if they get a 10% on the first exam but manage to nail it with a 90% later on? Clearly that student has done a great job and learned all the material we wanted them to, even though they struggled at first. In fact, isn’t that more impressive?

This format has some great features, which are beautifully in line with my design goals:

  • Good Incentives: If you understand the material quickly, you should be rewarded. Students who succeed are rewarded with more freedom. No one who has mastered the material should be forced to go through the motions. If you get a grade you’re happy with, you can choose to skip the rest of the exams with no downside.
  • Safety Net: Each exam offers a new chance to set a minimum threshold for your grade. Once you get a 85 on one exam, you can rest easy that your grade won’t go any lower. With this design there are no consequences for failure. You can bomb (or miss) as many exams as you want without any risk to your final grade.
  • Low Anxiety: Students who are able to get a good grade on one of the early exams will be able to worry about things other than cramming for the next exam. Maybe they’ll use it to study more, or maybe they’ll just go to the beach. I don’t care. If you can get an 80 on the final exam in week two of a six-week class, you deserve to go to the beach.
  • Transparency: With this format, there’s no more need for, “what will be on the test?” Once you have taken the first final, you will know (approximately) the format of all the other finals. This has the added benefit of:
  • Context: Seeing all the material at once will allow you to begin building a tapestry of ideas in your head. You will never be blindsided by new material, things you didn’t realize were expected of you. Once you’ve seen one final exam, you’ve seen them all, and being exposed to all the material early on will help you learn it better.
  • Feedback: You will be able to tell what skills you have mastered and which you need to work on. This will allow you to spend your study time wisely. Previous exams become a great tool for review. You can go over your performance with the TA or professor and be able to see exactly what you need to work on for the next exam, because the next exam is so similar.

I was really happy with this design. It hit all of my design goals, and it resolves a lot of the classic problems with exams.

Other people liked the idea too. I was on a date with a PhD student and we were talking about teaching, so I told her about this design. She said, “that sounds a bit insane upfront, but not so much when you think about it.”

Now there was nothing to do but try it out. For this class, I made the exam 50% of the final grade. Normally, making a single evaluation a huge chunk of the grade is unfair. But with this format, the exams are the best one of six evaluations, and besides, the exams test what I really want them to know.

1.1 The Results

Final-First exams worked really, really well.

I was worried that students would be confused by the format, or would be terrified when they failed the first Exam, but I actually got very few questions about it. Students seemed to understand what I was trying.

It really did solve all the usual exam problems. No one ever asked me for a makeup exam. Only once did I have to clarify what would be on the exam. When students wanted to meet to go over their answers, we were able to make real progress, because it was immediately clear to me what parts of the material they had mastered and what they were still struggling with. In many cases we could look back over two or three different exams and see the same thing tripping them up every time over multiple weeks.

Most people improved steadily over time. The average grade went from 60% on Exam 1 (this was by design; see below) to 85% on Exam 6. Students took the exams pretty freely. Some of them took every exam, but on average they took only 4 of the 6 exams.

A few students actually got their best grade quite early on. On the first final, at the end of the first week of class, the highest grade was an incredible 88% (!!!). This student kept taking exams, though, and was able to eventually beat her record with a 92.5% on Exam 5.

The student who got the second-highest score on Exam 1 got a 84%, again very high for having taken only three classes. This student chose to skip most of the other exams. He did take Exam 5, but only got a 75.5%, so in the end his final grade was actually based on his exam score from the first week of class!

I was a little surprised that more students didn’t try to get a great grade early on. When I think about this format, one of the most exciting things to me is the idea that you can teach yourself all the material, get ahead of the class, get a great exam grade halfway through, and not have to show up to class anymore. But while a few students got great scores on Exams 3 and 4, that was the exception. It might be different in a semester-long class. Six weeks is just not much time to teach yourself, even if you really commit to it!

These are extreme cases of the safety net working as intended, but the design worked equally well for students with less extreme grades. To my surprise, only 26 of the 39 students took Exam 6, the final final exam. I think this means that by the end of the class, many of them were satisfied enough with their exam grade that they chose not to take this last final. Of those who did take Exam 6, only 18 got a better grade on the final final than on any previous final, which means that 8 people didn’t improve their grade at all on the final final.

The best exam grade in the entire course, a 97.5%, was actually earned on Exam 5. Perhaps unsurprisingly, that student chose not to take Exam 6.

These grades are really impressive, because the exams were not easy. I came in with specific expectations of what a student should know by the end of intro stats. These expectations were reasonable, but they were also pretty high. We expect too little of undergrads, and we underestimate what they are capable of doing and understanding.

I didn’t change my expectations at all during this course. Every student who earned a 90% on an exam met my expectations, and every student who did better than that exceeded my expectations. In my opinion, a good grade means that they mastered the material.

1.2 Student Opinion

Students really liked the exams. Some of the most positive feedback was about this part of the class. Take a look:

“This was one of my favorite aspects of the course because it genuinely did relieve a lot of stress. My biggest fears for this course revolved around completing it and not only doing poorly, but also learning nothing. I think the weekly exams allowed me to continually refresh and apply what we had reviewed without the anxiety of failing the course.”

“I thought the idea of getting graded based on the best exam was exceptional since we learn more as we continue taking the class.”

“To be honest, this is the best [exam] format I’ve ever taken! It really gives me the motivation to study harder each time without getting too stressed out.”

Other comments were much the same. As you’ll notice, the experience students had with the format was exactly the experience I was aiming for. A few other notes of interest were:

“I found myself studying ahead of time to supplement the material I have not learned yet”

“Towards the end it was fine, but the first few were pretty stressful for me.”

The one complaint, which I did see a few times, was that the Exams tested them on questions they didn’t recognize and hadn’t seen before. But of course, this was by design, because I wanted to see if they really understood the concepts.

Some students seemed to understand this, with one noting, “[Jeff] helped us prepare as best as we could without actually giving us the answers.” And once again I’ll point to their excellent exam grades as proof that the difference in format wasn’t actually a problem.

2. Exam Design

This format is certainly the most interesting part of the exams. But the design of the exams and the exam questions is worth discussing as well.

The Final-First exam format doesn’t work if you don’t pay close attention to the design of the exams. Exams need to be nearly identical, so that students always know what’s coming on the next one. But they can’t be too similar, or else students will memorize them by rote. You need to keep mixing it up.

I had a plan for the exams going in. As I argued in What You Want from Tests, exams should be used to test the knowledge that students carry around in their heads, the bits that an expert will internalize. That’s what I was aiming for in this class. Research reports would cover their ability to actually do stats, and exams would cover their memory and intuition for the most important concepts.

Then, of course, the whole course was forced online. Immediately I knew that this meant that exams would de facto be open book, open notes, and really, open Google. So I knew that I would have to pivot away from my original plans. I couldn’t just focus on internalized knowledge.

(I never explicitly told students that the exams were open notes, but I never told them not to look things up either.)

I actually think this ended up improving the exams. I stand by what I said in What You Want from Tests, but it can be more complicated than I imply in that essay.

2.1 Exam Structure

The structure of the exams mirrored the structure of the course — after all, every exam was a final. Each exam was 50 points in total. Of that, 15 points had to do with basic data skills, 15 points went to descriptive statistics, and 15 points were on the use and interpretation of inferential statistics. Just like the course, the exams were divided into these three sub-topics.

The remaining 5 points went to what I called “advanced topics”. These were questions about things we mentioned in lecture but were slightly outside the scope of the class, more complex questions about the use of core concepts, or questions that tested their intuitions in ways that we had hinted at, but hadn’t explicitly discussed.

An interesting feature of this is that a student who mastered all the core material, but hadn’t yet achieved that deeper understanding, would only get a 90% on the exam, because the advanced section was the last 10% of the exam grade. A grade of higher than 90% means that a student understood not only all of the material at the expected level, but was making progress into understanding it more completely.

This is why I am so confident that the students who got above a 90% on their exam grade not only met my standards, they exceeded them. That last ten percent came from questions that were, by design, more difficult than an intro stats student should be able to answer.

2.2 Exam Difficulty

Maybe other teachers already know this, but something I had never realized before was that a teacher has a lot of control over the difficulty curve of an exam. I knew that a professor could make an exam more or less difficult, but I didn’t understand that you have a lot of control over the distribution of scores.

This was particularly important for a class using the Final-First exam format. In this system, most students take a final exam in Week 1, and of course most of them will bomb it. There’s a big difference in morale, however, between bombing an exam with 50% and bombing it with 5%!

I wanted to encourage students to do well. I wanted to make sure they felt like they could succeed from the very beginning. To make this happen, I designed the exam so that it was easy to get a decent score, but hard to get a great score. (For those of you who are statistically inclined, compare item response theory.)

(This is also how I asked Liz to grade the research reports. Make it easy to get a decent grade but hard to get a perfect grade, I said.)

I had already decided that 15 points, or 30% of the exam, was devoted to data skills. This stuff is pretty easy, and so I knew that most students would be getting a good chunk of points from this section right from the start. In the other two sections, I made sure to include a couple easy questions, to keep the baseline grade relatively high.

The fact that the average score on Exam 1 was 60% shows that I was successful. In fact, even in Week 1, the lowest exam grade was a 40%. That doesn’t sound like much, but considering that we were only 17% of the way through the class, I think it’s pretty good.

I used some other tricks for this as well. One was that the exam was almost entirely multiple-choice. A classic problem with multiple choice questions is that students always have a decent chance to get the right answer by just guessing. For example, a student guessing on a multiple-choice question with four answers will get the right answer 25% of the time. An exam with nothing but 4-answer multiple choice questions has a baseline grade of 25%. It’s even worse for an exam that’s all true/false, which has a baseline of 50%. This is why up until 2016, the SAT took off 1/4 a point for each wrong answer. Statistically, it meant that a student who did nothing but guess would get a score of about zero.

But we can turn this same force to our advantage. To adjust the baseline score, I can change the number of answers I include for my multiple choice questions. This is exactly what I did. For the Data section, which I wanted to be a score-booster, all the multiple choice questions had only a few answers each. For the Advanced section, where I wanted students to earn points only if they really knew their stuff, most of the multiple choice questions had 8 or more response options! And for the other sections, which I wanted to land somewhere in between, I included a mix.

Of course, there are limits to how lenient we want to be. In particular, true/false questions seem too easy — a baseline of 50% just from guessing is way too high. One idea that I really like is True / False / Can’t Tell questions. At a shallow level, these are just true/false questions with three options instead of two. But at a deeper level, this encourages students to engage with the question in a new way. Instead of just determining which answer is right, they have to think about whether they even have enough information to make that call. It literally adds another dimension to the question. This is especially well-suited to statistics, which is all about making informed guesses based on limited information.

I used a similar approach in some of my short answer questions. I’ve noticed that in class, students are often much more comfortable telling you why something is wrong than trying to give you the right answer themselves. I translated this into “What’s wrong with…” questions. Students would be given a short paragraph that described some statistics. In each case I had inserted an error into the paragraph. For example, sometimes I would say that a variable wasn’t skewed, but I would report a mean and median that were strikingly different. Students would have to pick out the mistake and tell me why it was wrong.

This is a really important skill in real life. A big part of the practice of using stats as a scientist is noticing when something is wrong in an analysis, whether you’re checking your own analysis or looking over someone else’s work.

I included one of these questions in the Data section for almost every exam, since they are a good way to ask about data features like skew and range without just asking students to regurgitate the definitions. I also included a few in the Descriptive Statistics sections, and I think that added some nice variety. You know a student doesn’t understand correlation when you report r = 1.2 and they don’t catch it.

I realize now that I never included any of these questions about inferential statistics. This was a mistake, since catching errors in the reporting of tests is something that comes up all the time. If I taught this class again, I would put “What’s wrong with…” questions in all three sections of the exam.

Another way to control exam difficulty is with paired questions. You include two questions about the same topic, but one is easy, and one is harder. For example, in my descriptive statistics sections, I always included two questions where I described some data and asked students what plot or chart they should use to represent that data. By design, the first of these was always pretty easy, and the second was, while not exactly hard, a more sincere test of their understanding.

This has some great features. First, it helps raise their baseline score. A student who understands the idea even a little will usually get the first question right, and this will boost their grade. They essentially get partial credit on that concept, even though the question is multiple choice. (They say you can’t give partial credit on multiple choice questions, but what do they know?) But a student only gets full credit if they can answer the more challenging question. Again we see that the design makes it easy to get a decent grade, but hard to get a perfect grade.

Second, it helps with feedback. For any topic on the exam, if a student gets neither question right, they clearly do not understand the topic at all. If they get the easy one right but not the harder one, they understand the basics but haven’t quite got the whole idea. And if they get both right, it’s clear they understand it at the level I want them to. If they somehow get the hard question right and the easy question wrong, this tells you that they were probably guessing. You can look at the exam and see exactly how students are doing with each of the core skills.

2.3 Difficulty Over the Course of the Class

As important as the difficulty curve within an exam is, it’s also worth mentioning difficulty curves over time. Part of the reason to make an exam easy to pass but hard to ace is that this is good for student morale, while still being an accurate measure of their ability. With a Final-First exam, you also want to worry about difficulty over time.

Students shouldn’t get a good grade on the first final unless they really know their stuff. Early on, exam grades should be pretty low. But if exam grades go down with every exam, or even if they fail to go up, that’s bad for morale. It tells the students that they aren’t learning anything from the class. That shouldn’t be true, and even if it is, you shouldn’t be telling them that!

My recommendation is that your hardest exam should go first, and your easiest exam (still staying true to what you want them to get out of the class) should go last, with the other exams in order of difficulty in between. And of course, for the reasons described above, your hardest exam should still be designed so that on average students do decently on it. If the average score on the first final is less than 50%, you’ve probably done something wrong.

One thing that I would like to do someday is create a way to generate exams automatically. These exams are formulaic by design, so it would be relatively easy to write a script that would mix & match components and spit out as many exams as you want. Not only could this make the exams more fair and regular, you could do things like share multiple practice exams with your students.

3. Exams Online

As with everything else, I was worried about exams being online. There were the concerns around cheating, as I mentioned above, and also just around giving an exam remotely.

I was wrong. Holding exams online is one of the best things I’ve ever done for a class. It was so easy that I am seriously considering using online exams for in-person classes in the future.

I ended up running all my exams through Qualtrics, a survey software I use in my research. Qualtrics is flexible and it has a lot of nice features that are helpful for exams, but I suspect you could run online exams with other survey platforms.

Exams were run every week. Since my students were located all around the world, and since many of them had jobs or other responsibilities, I opened the exam for a full 24 hours. Lectures were Monday / Tuesday / Wednesday, and every week the exam was open from 5:00pm EST Thursday to 5:00pm EST Friday. Using the survey software, it was easy to have it open all day and let them drop in whenever they wanted. I also liked how this didn’t cut into class time.

Qualtrics automatically records the time when a session is opened and when it is submitted, so I used that to time their exams. The exam would begin as soon as a student clicked on the link, since that prompted Qualtrics to record the session start. I recommended that they time themselves to ensure that they didn’t go over. We compared their start and their submit times to see if they followed directions. Some of them did go over by a little, but we were lenient, and graded those exams too. To my surprise, no one tried to sneak in a much longer exam session.

After some pilot testing with my sister, I ended up making the exam only 45 minutes long. This isn’t much time, but I figured it would be easy to add time later if I had to. I was worried that students would complain, and fully expected that I would have to bump it up to 60 minutes after the first few exams. But this ended up being unfounded too. I didn’t get any complaints about the exam length — students never mentioned it! — and so I kept it 45 minutes long for the whole course.

Short exams also fit my design goals. There’s no need to belabor an examination. As long as it’s accurate, it should be as short as possible. Once again, I imagined how it would be if, through some horrible clerical error, I was forced to take the class myself. I knew I would be able to ace the exam in about 15 minutes, so I wouldn’t be forced to waste more than a tiny amount of time. That’s how it should be.

Running exams online also gave us huge benefits on the backend. Exams were incredibly simple to grade. Once all the scores were in, I would take the exam myself, putting in all the right answers and writing ANSWER KEY in the name field at the end. Then, when Liz downloaded all the responses for grading, she could just use Excel functions to compare each of their answers to the responses I put for the answer key, and automatically assign points that way. There were always a few short-answer questions to grade by hand, but the majority of the grading, for every single student, could be accomplished in just a few minutes.

And unlike working with scantron or paper forms, there is no headache when it comes to digitizing the results. Answers and scores were in a spreadsheet from the beginning.

It was easy to make answer keys for the same reason. Admittedly I didn’t know this at first — all the credit goes to Liz. It turns out that you can make Qualtrics generate a PDF of all the answers given by a specific person, so all we had to do was get it to spit out the ANSWER KEY responses and, surprise, there was the answer key. Again your mileage may vary, but online systems can be very powerful.

The online format does offer students the opportunity to cheat. But as I already mentioned, I don’t think they did, and I don’t think it would matter either way. There are things you could do to help prevent this, if you were worried, like giving a narrower exam window or putting out multiple versions of the exam to prevent crosstalk, the sorts of things we already do in the classroom. You could make projects a bigger part of their grade. But I think it’s to everyone’s advantage to trust the students.

With a well-designed exam, it will be easier to learn the material than it will be to cheat. The same goes for open notes. If you make a good exam, it will actually be quicker for students to leave their notes closed.

5. What I Didn’t Get To

I got to put almost everything I wanted to in this course, but there were a few things I missed.

I’ve always wanted there to be a bigger role for teams, but the teams in this class didn’t work very well. It seems like there should be ways to encourage students to help one another out, reward them for working together. But all the ideas that come to mind, like giving students bonus points for helping their teammates, have obvious problems. So while I want to incentivize teamwork and peer support, I haven’t come up with a way to make it happen yet.

Students would also really benefit from giving and watching presentations. I was able to do this for my RA, and it’s clear to me that she gained a lot from making the presentations and from getting feedback. Criticizing presentations and giving feedback is also good practice for statistical literacy, and it might be less intimidating for the average student.

But it would be difficult to have every student give a presentation. It’s probably impossible for large class sizes, and it doesn’t seem like it would work well online. During the semester, you might be able to do it in recitation, either for extra credit, or in small teams.

But the real problem is that giving a single presentation is like answering a single math problem. It’s just not that much practice. Unless the class size were very small, you probably couldn’t set it up so that every student got to present multiple times. This might be better suited to an advanced course. The breakout room activities, given that they include small and regular “presentations”, might be the best we can do here.

6. Concluding Remarks

I’ve heard a lot about the things you can and can’t do when teaching stats. I’ve heard that you can’t get students to pay attention. That you can’t make them care about the subject. That they’re all cheating on their assignments. That they aren’t smart enough to learn how to use statistical software on their own.

Things are bad in education today, but they’re not bad because of lack of funding, or because students are unmotivated. Things are bad because educators lack vision.

What else do you call it when everyone knows what the problems are, but no one manages to dream up solutions? We have the ability to make education work for us, and nothing special is required, just careful thought and patient experimentation.

In particular, there are huge gains to be had in developing approaches that let students and teachers stress less over the material and waste less time. This may free them to spend more time learning, but it may also free them to have a life outside the classroom. A class with more hours of homework, longer tests, and more fiendish questions is not a better class. In most cases it is a worse one.

What could be better than learning more, with less effort, and in less time? Let us celebrate academic laziness. Perfection comes not when there are no more assignments to add, but when there are no more assignments to take away.

Students have almost no control, of course, but it’s confusing how teachers continue to design classes with backbreaking grading loads for themselves. Just give fewer assignments, shorter assignments, assignments that are easier to grade. You can do this without making your class worse. In fact, you can do it while making your class better.

So many teachers teach classes that they themselves would hate. If you wouldn’t want to take your class, if you wouldn’t find it easy, then what are you doing? It seems unnecessarily cruel to me. Make your classes enjoyable. If you can’t make them enjoyable, at least make them easy. If you can’t make them easy, at least make sure they’re not a huge pain.

So many teachers are paranoid about students cheating, collaborating, or doing too well on tests. Are you a teacher, or a mall cop? When classes are fair, students don’t cheat. Even when classes are rigged, most students still refuse to cheat. Taking this approach creates a system where the most honest students are the ones who have the most to lose. I have seen too many honest students fail what should have been an easy class.

It’s August as I’m writing this, and online I have seen many examples of college professors sharing heavy-handed “how to be ok pages” or “COVID pages” that they plan to attach to their syllabi for the fall semester. These pages contain assurances that you can come to the professor with anything, that you can get extra time when you need it, and so on. Professors love these pages because it makes them feel like they’re doing something to make a difference. But these promises are hot air and all your students know it. If the structure of your class is cruel, this kind of statement becomes a sick joke. And if the structure of your class is kind, then you don’t need a page at the front of your syllabus trumpeting it. It’s the fundamental rule of communication: show, don’t tell. Put your good intentions in the structure of your class or not at all.

Just make a class that doesn’t suck.

Hindsight is Stats 2020, Part II: Design Goals & Grades

[This is Part II of a retrospective on teaching statistics over summer 2020. Part I is here.]

Grades are stupid. But at the end of the day, my university forces me to give everyone a final grade. And you do want to evaluate your students based on something, so they can know what they mastered and how they can still improve.

1. Design Goals

To begin with, I tried to work out my design goals. I started by thinking about the ways that classes normally fail and decided to work backwards from there.

One of the most blatant failures in the education system is when students are forced to take a class that they’ve already taken, or on a subject they already know. So my first goal was that someone who really knows the topic should be able to get a 100 with very little effort. There’s an easy way to check if this works: the course should be designed so that if, as the professor, I were to take it, I would ace it easily.

And not just ace it. Someone who really knows the material should, after demonstrating their knowledge, be able to walk out of the course entirely and never have to come back. Once you know the material, you shouldn’t be forced to waste your time regurgitating it.

A related problem is forcing students to waste time on concepts they already understand; or, conversely, moving on to new material before a student is ready. This is tricky because students really do learn things at different speeds. We can’t tailor the lectures to every student, but we can do things to help. Students should be given freedom to focus on the problems they find challenging. Once a student has mastered something, we should try not to bother them about it.

Similarly, most classes don’t incentivize students to learn things on their own. There’s no point getting ahead of the rest of the class. You’ll just be bored, and it might even hurt you, since it will be taking away from the time you could be using to cram the old material. This is a perverse incentive. If a student is ready to go further on their own, we should let them.

Basically, if a student wants to speedrun my class, who am I to complain? Let them do it.

Another classic way that classes screw up is by making students afraid of failure. With traditional grading, students have no room to experiment with different ways of learning, understanding, and studying. The class format requires them to obsess about every evaluation, and encourages them to do the minimum amount required to get the grade, to take no risks. If they try something interesting and fail, their GPA plummets. This leads students to obsess over pointless minutiae like what precisely is on the test and exactly how to word their answers.

I wanted to save them the time they spend thinking about this nonsense. If they choose to spend that saved time studying, so much the better. If they don’t, then all we are losing is their anxiety. Either way, we should reward students for taking risks and attempting to go deeper with the material, not punish them.

In the end I came up with three ways to evaluate student progress.

First, I had a system to replace class participation and attendance, based off of small team activities, which counted for 30% of the final grade.

Second, I had students independently analyze two simple datasets of their choice, and write up a report about each. Together the two reports counted for 20% of the final grade.

Third, I invented a new exam format (covered in the next post), which counted for 50% of the final grade.

2. Teams & Breakout Rooms

I really hate attendance.

Taking attendance is undignified. It’s disrespectful of students, who are assumed to be incapable of making informed decisions about their education, and of the professor, who is implicitly supporting that assumption. If students are sick, have a family emergency, or need to go to the dentist, they should be able to do so without worrying about their grade. They shouldn’t have to send me an email with a doctor’s note. I don’t like getting those emails—just stay home if you’re sick—and I’m sure students don’t like sending them.

All of this is doubly true of online teaching. All the lectures are recorded. Students can watch and re-watch my presentations as many times as they want. Why should any of us care about them being “in class” when that means almost nothing in a virtual classroom?

When I taught Introduction to Psychology last summer, I tried using a participation-based system. Rather than taking attendance, I had my TA mark down when students spoke in class. The idea was that this would encourage them not just to show up, but to participate in class discussions. I also hoped it would encourage them to do the assigned reading, which we discussed each day.

This didn’t work. Students would speak up even when they had nothing to add, just to get the grade. The quality of discussion suffered for it. Some very shy students didn’t speak at all, and lost points despite the fact that they were doing great in the class otherwise. It was a huge pain for my TA to keep track of it all. This system didn’t do anything I hoped it would, and I think it was a failure.

We could just chuck attendance altogether. But on the other hand, it’s good to have some kind of incentive for students to show up to class. Recorded lectures are about as good as live ones, but if students show up to class most of the time, they can ask questions and I can get a sense of what they do and don’t understand. It would be good to encourage most of them to be there most of the time. Can we come up with a way to make this happen?

2.1 Enter the Zoom Room

One of the things that everyone learned early on in the pandemic is that video calls suck. Jumping onto a Zoom call is excruciating, and afterwards you feel drained of all will to live. Turning off your camera helps, but not by much.

At first this seemed universal. People speculated that it was something inherent to the Zoom platform. There were theories that even subtle video latency was unnatural and jarring. But over time, I noticed two exceptions. The first was direct calls, with smaller groups. Hanging out with one or two friends over Zoom, while not as much fun as hanging out in person, didn’t make me want to tear my eyes out the way a Zoom call with several people did.

The other exception was playing virtual trivia. Early on in the pandemic, my friend Liz from my PhD cohort set up a virtual trivia night for students in our program. In virtual trivia, we would all gather in one Zoom room to start off. For each round, teams would be sent off into individual breakout rooms for 10-15 minutes to answer questions. Then we would all come back to the main room for scoring. We’d do this process for each round, with a couple of trivia rounds each night.

This was infinitely better than every other group call I had been on, and it wasn’t just that we were a group of PhD students drinking late at night. The breakout rooms were just as relaxed as being on a small call, and they broke up the evening in a way that made the main room much more fun, even though the full group was pretty large.

When I started thinking about how to run an online class, I knew I would have to include something like this.

(Liz also happened to be my TA for the stats course!)

I had been wanting to incorporate something about teams for a while, and this seemed like the perfect way to do it. Instead of sending teams off for rounds of trivia, I would send them off to do breakout room activities, and call them back to discuss the answers.

These activities took different formats depending on the topic we were covering each day, but most of them worked something like this. I put up a question or a task on the slides, and then sent the students into breakout rooms for about 10 or 15 minutes. When they came back, I randomly chose a couple teams to share their answers.

Getting the correct answer wasn’t the point. If the group provided an answer that seriously engaged with the activity, the group got credit for that activity, even if their answer was incorrect. The only way to get no credit was to not engage with the question or to give no answer at all. If I didn’t call on a team, that activity didn’t affect their grade.

This seemed to be the perfect replacement for attendance. At least one member of every group would need to be there every day, while individual members could come and go if they needed to. But part of their individual success would come from helping to make sure that the whole team was successful, so it was still in their interest to show up and help out whenever possible. I didn’t need to keep track of who was there, I just needed to give activities and ask them for their answers. And I didn’t even need to grade their responses, just record if they made an attempt.

I also hoped that this would give them some level of social support for the class — the kind of friendship they would normally get from the students sitting next to them, and people to go to if they needed help or support.

Another benefit was that this broke up the huge lectures into smaller chunks. Intermissions had already broken the 2.75-hour classes into two sessions of about 1 hour 15 minutes. With breakout room activities, days could end up being four sessions of about 30 minutes each, with activities and an intermission in between. That’s a lot better.

This was also meant to be a grade boost. A whopping 30% of their final grade came from their team grade, and because all you had to do was show up and try to answer the questions, I expected most teams to get 100%. I included this grade boost because I didn’t want them to worry about their final grade too much. This way, they would still have to work to get an excellent grade, but a student who did a decent job wouldn’t have to worry about failure. (As I mentioned earlier, I think that grades are kind of a joke.)

I shared a brief stats experience survey with my students the week before class, and I assigned them to teams based on their responses. I wanted to make sure that each team had a diverse collection of skills — that there was at least one student in every group who was comfortable with public speaking, at least one with decent math skills, and so on. The idea was that every team would have the skills they needed to succeed, and they would all have someone to turn to for help on any subject. I ended up with eight teams of five students each.

2.2 How did Breakout Rooms Work?

The grading worked just as planned. Seven of the eight teams got perfect marks on their breakout room activities. The other group missed one day (none of them showed up) and got about 90% on the team grade. But in general this provided exactly the padding I intended.

Or, almost. In retrospect, 30% was way too much. Students got really good grades anyways, and it wasn’t all thanks to the team grade — remember, more than 50% got an A! Making the team grades only 20% or even only 10% wouldn’t have changed their grades by very much, because they were all doing so well on other parts of the class. Mostly, I think it should have counted for less than 30% because it’s a shame that so much of their grade came from something unrelated to their understanding of the material. I am very happy so many of them got a 95 — I just think it would be better for them to get a 95 from nailing the assignments and exams than showing up and participating! It’s something I would do differently next time.

The activities worked really well. Lectures can be, let’s face it, pretty boring, and I think having these class exercises helped keep students from falling asleep. There’s also no better way to learn something than doing it yourself, and so following each lesson with an exercise was a good idea. And it was nice on my end to take a quick break, wait a few minutes, and see how they had done when they came back.

You do have to be careful with the activities, though. Activities work well if they are a simple problem, something the students couldn’t do when they signed on, but can do now that they’ve seen the day’s lecture. This helps the lesson stick in memory, and demonstrates why what they just learned is actually useful. Activities can also take a “don’t take my word for it, see for yourself” approach, and I liked this when I was able to use it.

No matter what though, the activities have to be easy. They aren’t a challenge or an exam; they exist to round out the lecture and serve as a teaching aid. It’s ok if students struggle with the details; it can be good for them to get a sense of their own limitations. But if they get stuck, can’t do the activity, or reach a dead end, then they don’t learn anything. The implicit message is that they can’t handle it, and that’s not the right message to send them. They can handle things that you’ve prepared them for; don’t give them assignments you haven’t prepared them for.

Students had mixed opinions of the teams. I got feedback like, “there was zero accountability for the breakout rooms … Most of the time, my teammates wouldn’t show up” and “as the days progressed, my group became unresponsive to the point where I was simply doing the work and presenting it on my own.” A few of them did have positive things to say about the teams, but clearly that was the minority opinion.

Most students liked the breakout room activities, though. “I was able to apply the material and then receive feedback (if called on) instantly. The breakout rooms presented a great opportunity to work through what was being discussed,” one student said. Another wrote, “Breakout rooms really allowed me to understand the application of concepts. I don’t think I would have been able to work through the research reports (or the finals) with as much ease had we not gone through related work individually and then as a class.”

The only complaint I saw about these activities was that I gave students too much time to work on them. I find this confusing, because I assumed students would be happy to have an extra 5-minute break to go and make a sandwich or something. Either way, I mark this idea as another success. It does seem like it helped the concepts and skills really stick with them.

Some students suggested that the activities be designed to more directly prepare them for the exams — basically, to have the activities be examples of the kind of questions that appeared on the exams. I can see why they proposed this, but I don’t like it. The exams are designed to try to see if students can generalize stats concepts to new situations. (And from their grades, it’s clear that by the end they could!) If I give them practice with questions of a similar format, I think that would defeat the purpose.

Obviously then, the problem is the teams, and it’s not clear to me what the solution is. Students suggested that I could have them do the work as a team but then call on individual students for the answers. That’s a little too invasive for my taste. One reason to have teams is to help less confident students — you know, the kind who would hate being called on.

I could imagine making the teams larger, maybe groups of 7-10. With more students, it’s more likely that some of them would show up. I could also make the teams smaller, maybe just 2 or 3 people per team. This would lead to less diffusion of responsibility. In either case, I’m sure there would still be slackers. Students don’t like having slackers on their team, but if everyone is getting a 100% on their team grades anyways, I don’t mind if there are a couple freeloaders. Maybe teaching this in person, if that ever happens, would change the dynamic and solve the whole problem.

If I were to teach this in a classroom rather than online, I would have them do more class activities, but have each activity be smaller/shorter. Sending people to breakout rooms on Zoom is a bit of a commitment. It takes a minute to send them out and to re-orient on coming back, so you want them to get their money’s worth. But teaching in person, it would be better to just give them more diverse tasks. Rather than giving them a 10-minute worksheet, I would do something like throw three histograms up the board and give them 3 minutes to tell me what values you could and could not reject from each.

3. Research Reports

About a year ago, I wrote an essay called What You Want from Tests, where I outline two kinds of knowledge that you need to have mastery over a skill. The first is the sort of things that every expert carries around inside their head, and this is what I argue you should try to examine with exams and quizzes. The other kind of knowledge is the ability to actually use the skill. Without the ability to use the skill, any knowledge is just trivia. You’re not an expert, you’re just a fan.

Statistics is a skill-based course, so the second kind of knowledge is really important. I didn’t just want my students to memorize a bunch of facts about statistics, I wanted them to learn how to actually use statistics.

A few years ago I was working with an undergraduate who had volunteered to be my research assistant. She was an exceptionally bright and curious student, who always asked remarkably insightful questions. She was also very diligent, and had already taken several stats classes before she started working with me. She had even taken some MA-level stats courses, which is unusual and impressive for an undergrad.

Despite all this, I discovered that she did not really understand stats. She had a hard time conducting even basic analyses. She didn’t understand many of the concepts. Despite her excellent grades, almost nothing from the classes had stuck with her.

I already knew that she was gifted, and I was aware of the shortcomings of the usual stats education approaches, so I reassured her that it was not her fault, and I offered to help her do something about it.

At this point I had already done a lot of thinking about how to do a better job teaching stats, and I realized that people always forget to teach this practical side of the skill, even though the practical side is what actually matters. Now, there’s no mystery about how to teach skills. I learned stats by struggling through real analyses for projects that were actually important to me, and everyone agrees that working on a project you genuinely care about is the best way to pick up a new skill.

But this doesn’t work in every situation. Even for me, it was a struggle, and this sink-or-swim approach is too harsh for the classroom. It’s also inefficient for beginners, because real data is messy and confusing. If students bring in a real problem, the correct approach might be too advanced for an intro class. And scale makes it impossible. Do we expect every student in an intro course to be able to bring in a project they’re thrilled about? They don’t know anything about the topic yet, so they don’t know what a good project would be.

I realized that all these problems could be fixed by using fake datasets. It’s easy enough to generate data, and you can make it look however you want. And unlike a real project, you can introduce concepts one at a time so that the student is always ready for them.

So that summer, I made a bunch of practice datasets for my RA to work with. I wrote a set of R functions that would automatically generate datasets to my specifications. At the start of each day, I would give my RA a short lesson on a stats concept, and then send her a couple datasets. Naturally, most of the datasets would be in some way related to that day’s lesson. She would work on them all morning, prepare some slides, and at noon, before we broke for lunch, she would give us a presentation on what she found out. I let my other RAs give feedback first (giving critique is great training as well), and then I would ask questions and give her feedback.

The first datasets were extremely simple, and they gave her no trouble at all. Once she was comfortable with conducting simple analyses on her own, I introduced complications, the sort of wrinkles one would expect to find in a real dataset. First I introduced the concept of statistical power, and gave her some critically underpowered studies, so she could learn to interpret those null results as inconclusive. Then we had a discussion of outliers, when and when not to exclude them, and the datasets for that day included different kinds of outliers. We covered causal inference, interactions, p-hacking, and many other concepts in the same way. The concepts in these lessons were cumulative. Once we had covered outliers, for example, I would sometimes put outliers in the datasets later on.

The datasets at the start of the semester were really easy. The datasets by the end were almost as tricky as real-world data. But at no point did my RA work on anything that was too hard for her. Each new complication was just one step up from something she had already mastered, so she was always prepared to tackle it.

3.1 Class Projects

I knew I wanted to do something similar for my class, to give them the same kind of practice with the practical side of things. In particular, I like this approach because for each dataset, you have to figure out what statistical test to run on the data. This is one of the stats skills you use most often in the real world, and it’s often the first question you ask when thinking about an analysis. Yet somehow, intro stats classes almost never teach this skill. At best, students get handed an extremely confusing flowchart. I knew I could do better.

Unfortunately the approach I used with my RA doesn’t exactly scale. I couldn’t give them the same kind of step-by-step training. I couldn’t have them all give a presentation on every dataset, and of course, many students are terrified of presenting to begin with.

Still, I figured I could come up with something that captured most of the benefits. I took several of the simpler datasets that I had made for my RA and I put them in a folder on the class website. Rather than having to analyze all of them, students were required to pick two of these datasets and write a research report about each of them. They could do these two reports at any point during the class, but since they weren’t taught how to do most analyses until about halfway through, I expected most of them to do these assignments during the second half of the course.

Students are taught to write long. This is a bad habit, especially when working with such simple datasets. I limited research reports to a maximum of one page long, including any graphs and/or tables. Students should learn to be concise, and besides, I didn’t want Liz to have to sift through dozens of extra pages when grading.

Each research report was 10% of the final grade, so these assignments were 20% of their grade in total. They were free to analyze the data however they wanted, but in particular we thought that R, SPSS, and Excel/Google Sheets were good choices, so I included one session for each of those approaches in the lectures. This wasn’t much training, to be sure. A lot of people might have seen this as a big risk — you’re expecting them to use R or SPSS with barely more than an hour of training each? But I wasn’t worried about it. Somehow I knew that they were up to the task.

Fig. 1: “Burgers Have Cheese???.png”, an example created in the course of instruction on the use of Google Sheets.

Originally, I was planning to let students do up to two additional research reports for extra credit. But in the week before class, one of the students suggested that instead of doing research reports for extra credit, we could let them re-do research reports that they weren’t satisfied with. This basically translated to “do 4 research reports, get your grade from the best two”.

I liked this for a couple of reasons. First, it let them make mistakes on early research reports without huge consequences, which was one of my design goals for the class. Second, students who were struggling would be encouraged to do additional reports, which would give them the extra practice they need, while students who didn’t need additional help wouldn’t be bothered.

I implemented this change, with the requirement that the do-overs would have to be on new datasets. Students would get feedback from Liz about how to do better, but they would have to apply those lessons in a new context. I limited them to two of these do-overs at most. I wanted them to be able to learn from their mistakes, but also I didn’t want each of them doing 10 reports.

The research reports were not really about the grades. They weren’t so much intended as evaluations. Really, they were more like practice, or lessons. What I really wanted them to get out of the research reports was, “I can do this and it’s not scary”, because I think it will help set them up to be confident when using these skills in real life (and on the Exams). It wasn’t about challenging or testing them, it was about giving them the opportunity to try things for themselves.

About halfway through the course, one student emailed me to ask for more guidance on how to format the reports. At the very least, she said, I should give them an example of what one would look like. I told her:

This assignment is designed to mimic what doing analysis is like in the real world. Data is emailed to you in a confusing format, and the file is poorly organized. The people who have hired you to conduct the analysis don’t know exactly what they want and can’t tell you what kind of test to conduct; after all, that’s what they hired you for. I’m trying to give you a controlled version of this experience — not nearly so confusing as real life, but where you are asked to exercise your judgment and the knowledge we’ve covered in class. Giving you any more guidance on how to conduct the analysis or write the report would defeat the purpose of the assignment.

To this student’s credit, she totally understood my point and ended up getting a 98 on both research reports.

A final reason to like the research reports is that they capture my “walk out of class once you’ve mastered the material” goal. If you already took stats but you were for some reason forced to take my class, or if you decide to teach yourself all the material in the first week, then you can just throw together two one-page reports, get an A+ on both of them, and forget about this part of the class entirely.

3.2 How did they do?

Students really surprised me on the research reports. When I first looked at the grades, I thought that maybe Liz had been too lenient. Almost all of them had gotten A’s! But when I looked closer, I saw that the students had earned them. The reports weren’t perfect, but they showed serious critical thinking and really creative engagement with the datasets. All very impressive for a subject they had been studying for less than six weeks!

When I looked back, I saw that on their first submissions, many students had gotten B’s and C’s. Liz wasn’t being too lenient at all. In fact, her feedback was intensely detailed! But this helped the students enormously. It’s clear that the students took that feedback and turned it around for their do-overs, and that’s what ended up earning them those A’s.

Some students, I was happy to see, didn’t need the do-overs. One student did her first two, got a 98 and a 99, and unsurprisingly, chose not to submit any more. Another student, who had said in class that she was terrible at math, gave it a shot and to her great surprise earned a 93 and a 90. She decided that was good enough for her, and didn’t send in another. The system works.

I especially liked how diverse the reports were. Students used all sorts of weird charts and phrased their results in all sorts of unusual ways. Not wrong per se, just the sort of thing an expert would never do. I think this demonstrates real understanding. Rather than just copying someone else’s approach, they had come up with their own, often slightly bizarre perspective, and then applied it. That’s what mastery looks like, folks.

How about the software? Some of them came to me or to Liz for help, but honestly, not as many as you might expect. For the most part they seem to have taught themselves.

When I was looking through the reports, I saw that most of them chose to use R for their research reports, and almost all of them did a solid job of it. This was a big surprise, but it’s very encouraging.

In conversations about how to teach stats, I’ve often heard, “It would be great if we could teach the students R or python. But you just can’t teach the average student a programming language in only one semester. It would take up too much of the lecture, and there would be too many questions for the TAs to handle. We should stick to SPSS worksheets and formulas for now, that’s the sort of thing that students can deal with.” I’m happy to have evidence that, in my opinion, proves this entirely false. Apparently students can learn the basics of R with almost no instruction, and in less than six weeks, as long as you give them the right environment for it.

I’m pretty happy with the research reports. Is there anything I would do differently next time? Well, one thing Liz pointed out to me is that while I gave them 24 different datasets, most of the reports were on the same 4 or 5 options. These were some of the most straightforward datasets, and most of them were analyses of correlation between two variables.

Now, as I said before, the research reports are not really about challenging students. I’m fine with them doing two easy reports, since doing any independent report at all is great for intro stats. But conducting correlation tests both times does slightly defeat the purpose of doing two reports.

A better system would be to break the research reports up into different bundles. Bundle A could be the easy ones and Bundle B could be more challenging. Bundle A could include one set of tests and Bundle B could include the others, so that every student would have to use at least two different tests. You could maybe include a Bundle C of advanced datasets. These could either give you extra points just for attempting them, or they could be strictly for extra credit. In any case, adding some more structure to the research reports would probably improve them.

Hindsight is Stats 2020, Part I: Fractal Course Design

This summer (2020) I taught Statistics for the Behavioral Sciences.

The course was unusual for a number of reasons. I’ve wanted to teach stats for a long time, so I came into this class with a collection of unorthodox ideas that I’ve been sitting on for a few years.

Things went really well. I had high expectations, but more than half (!) of my students got an A or higher. I didn’t shift my expectations, or make the class easier halfway through. These grades mean that most of the students either mastered the material to my satisfaction or came very close to doing so. This approach worked and I would definitely recommend it.

1. Course Format

1.1 Being Online

The big curveball for this class was the pandemic, which made it necessary to teach the class online. I’ve never taken a course online, and I had never expected to teach one that way. Going into this, I had almost no experience with online classes. When we transitioned to online instruction in March, I was TA’ing for a class, so I got to see how that went. But that was about it.

I’m confident in my skills, but there were a few things in particular that I was worried about.

One of the really rewarding parts of teaching is getting to know your students. But Zoom isn’t that great, so I was worried that there might be no personal connection. Partly I was worried that the class would be less enjoyable. People like making friends and knowing that the instructor cares about them. But part of it was also practical. Without that sense of the classroom and knowledge of the students, I was concerned that I wouldn’t be able to tell when students didn’t understand the material. Maybe I wouldn’t be able to explain things as well when they had questions.

The other major concern I had was cheating. I knew that in the transition towards online classes brought on by the pandemic, many schools forced students to install unsettling exam-monitoring software on their personal devices. This sort of thing is pretty evil. While I would never consider spying on my students, it did make me worry about cheating on exams. With online exams, it seems like it could be a real problem. But I also know from being a TA that students cheat a lot less than professors think they do. In the end I took no special steps to prevent cheating. I don’t really care about or believe in grades, and I decided to trust the students.

1.2 Personal Connection

It turns out that both of these concerns were unfounded.

Admittedly, there was very little personal connection. I didn’t get to know most of my students. I would recognize their names, but I never even saw most of their faces.

But no one seemed to suffer for it. In the end we still developed the rapport that you need for good teaching. In their evaluations, students said things like:

“Jeff was a great teacher! He clearly loved the subject, and wanted to try and teach it in a more accessible way”

“Jeff specifically explained things very well and was so real. It was nice hearing examples in ‘layman’s terms’ that were more approachable”

“I really felt as if this teacher wanted us to do well, and helped us learn as much as possible in the clearest way possible. … Great great teacher!”


“Jeff is cool”

This experience has changed my mind about classroom engagement, and makes me doubt some of the common wisdom about teaching.

Is getting to know your students a reasonable expectation? Certainly we can get to know our students. But is it appropriate? Students aren’t in your class to be your friends, and you’re not there to be their pal. People are in the classroom to, hopefully, learn something.

“Personal connection” often seems to be used as a proxy for respecting your students and treating them like human beings. But — surprise! — you can respect your students and treat them like human beings without necessarily having a friendship with them, or even knowing their names. Students are sensitive to this difference. They care about being treated with respect, but don’t seem to care about the other stuff.

A cynical take would be that professors use the excuse of “getting to know their students” to push students into having an unnecessarily friendly relationship. But pretending to be equals when you are in a position of power over someone is at best dishonest, and at worst is a way of denying that you have a responsibility to them.

I do think there are things you can do to drive engagement. But I don’t know if it really matters. My students got really good grades and displayed surprisingly deep understanding of the material, so it didn’t hurt their education. And many of them told me that this was one of the most enjoyable courses they have ever taken, so it didn’t seem to make learning any less fun.

1.3 Cheating

I was even more wrong about cheating. I didn’t see any evidence of cheating on exams or assignments, and there was plenty of evidence that they weren’t cheating. Students made lots of simple mistakes, which they could have avoided if they were cheating. Exam scores improved incrementally over time, just as you would expect from honest learning. Their assignments and answers on the tests were idiosyncratic, not the carbon copies you might expect if they were sharing answers. If students were cheating, they didn’t leave any trace of it, and so I’m inclined to believe that they didn’t.

The lack of cheating is a little weird. When I was a TA, I would catch students cheating all the time. They usually do a bad job of it — they forget that I was a student not too long ago, and so they don’t realize that I know most of the tricks. So the fact that we didn’t see any of the classic signs is strong evidence that there wasn’t any cheating.

So why didn’t they cheat on my class, when they do cheat during the semester? I think it has to do with trust. In the exit survey for the class, one student wrote down, “no feeling of being ‘cheated’ by the prof”. Another student wrote, “My biggest fears for this course revolved around completing it and not only doing poorly, but also learning nothing.”

Students tend to stoop to cheating when they think, often correctly, that there is no other way to do well in the course. When professors are unclear about expectations, or make examinations needlessly difficult, the students feel cheated by the professor, and will cheat themselves. When you see an exam filled with trick questions, it’s hard not to feel like the game is rigged. But to their credit, even in this situation, most students still won’t cheat.

Teachers have a lot to learn about cheating. If you don’t cheat your students, most of them won’t cheat on your assignments. It’s about trust. Not your trusting that they won’t cheat on assignments — their trusting that you won’t cheat them in their education.

This all makes it especially disappointing that, during this pandemic, so many schools are engaging in unethical surveillance of their students in the name of academic honesty. Students just don’t cheat all that much, even when they definitely could get away with it.

2. Course Content

So much for the course format. What was I actually teaching?

2.1 What’s Wrong with Stats?

Statistics education is pretty terrible, and everyone knows it. All the professors who teach stats agree: students come into class, usually manage to pass, and retain almost nothing.

Everyone is looking for the magic bullet. But even so, no one thinks it’s a great mystery. Professors and TAs will all tell you the same thing: the problem is motivational. The majority of students, they say, simply aren’t interested in learning this esoteric form of math. As a result, most of the proposed solutions are motivational as well: find a way to make it fun and interesting, or at least find the right set of rewards and punishments.

But when I was a TA for intro stats, I noticed that this didn’t match what I saw at all. The students in my recitations were engaged, and really wanted to understand stats. They asked insightful and sophisticated questions, and were always pestering me for more detail. Yet somehow they seemed to come back every week having forgotten everything we discussed the week before. This isn’t the behavior of students who are checked out — this is the behavior of students who are trying, and repeatedly failing, to build a model of what is going on around them.

Even if I had been wrong about most students, there were a few of them who were clearly both able and motivated. These students got perfect scores on multiple tests and assignments, regularly came to my office hours, and discussed many of the concepts in great detail. They showed me the extensive, meticulous notes they had taken in lecture. But when it came to answering simple questions about the material in a new context, they always came up blank.

These students weren’t lacking in motivation or intelligence. So it must be external; something about the class was failing them. Even if everyone in the class were as motivated as these high-achievers, we would still be having trouble with comprehension and retention.

2.2 Driver’s Ed

I think the motivation story is all wrong. The problem is that the subject is taught at the wrong level.

Imagine you are taking a driver’s ed course, and have just shown up to the first day of class. The professor gets up and says, “Hi everyone, in this class you’re going to learn all about cars. Cars are really amazing. Some people use cars to get to work. Some people use them to get to school. Some people use them to go on vacation! There are a lot of kinds of cars. The big ones are called trucks. Those ones carry things like fruit and gravel. In this course you’ll learn all the different kinds and their uses, and we’ll talk a bit about the history of cars.”

You raise your hand, “Excuse me, professor. I’m here because I want to learn how to drive. I didn’t come here to learn about the types or history of automobiles. I’m sure that knowledge will come in handy in some ways, but it’s really not my focus. How do you actually drive?”

“Worry not,” he says, “To drive, move the wheel back and forth.”

So you leave that course and you sign up for a different one. You show up to the new class, and the professor gets up and says, “Hi everyone, in this class we’re going to learn all about cars. We’re going to be starting with the drivetrain. It’s important that you be able to describe and identify all the parts. Look at this diagram. Here’s the gearbox (which you can see is constant-mesh), clutch mechanism, the flywheel, the differential…” You get up and walk out of the room.

Neither of these classes will teach you how to drive. And sadly, this is a pretty good metaphor for how statistics is usually taught. Some statistics courses give students an overview of probability theory and a brief sense of the history, without teaching them how to actually conduct an analysis. Others throw the equations right on the board and start discussing the terms without any context. All too often, a single class will try to include both of these approaches. This is probably worse than either of them alone.

Students don’t want to learn a list of tests, the life history of Ronald Fisher, or the exact meanings of the terms in the formula for the pooled standard deviation. All these are things one naturally picks up over time, but none of it is useful without the core knowledge. Students want to learn what statistics is and how we actually use it. But somehow they seem to come away from our courses without having been taught either of these things.

Driver’s ed focuses on the point of contact: how to use the car. Similarly, the main goal of this class was statistical skills and how to use them.

I wanted students to become statistically literate. Most students won’t end up being researchers or statisticians in the same way that most people who take driver’s ed won’t end up being auto mechanics or engineers for GM. We still benefit from knowing what a car is and how to operate it. Similarly, students benefit from knowing what statistics is and how to use it. For those students who do want to go on to use statistics professionally, this will still give them a strong foundation. Auto mechanics don’t suffer from having taken driver’s ed in high school.

The focus was limited and practical. Students were taught how to recognize different kinds of variables and data, interpret standard plots and graphs, read and understand statistical reports, and conduct basic analyses using statistical software. I alluded to other subjects of interest in lectures, but in the lessons and the evaluations, I focused on these basic skills.

We can also talk a little bit about what I didn’t want to cover. The history of stats is interesting, but most of the time it doesn’t help you be a better statistician. The most important thing to know about the history is that these tests and concepts were just invented by a few guys not all that different from you and me. Anyone can make up a concept or design a new test. You assign it a Greek letter and suddenly it sounds official, but for all we know, Fisher came up with it while sitting in the bathtub. Besides that, most of the details don’t matter. Aside from a couple of helpful examples, I didn’t teach them anything about the history of statistics.

You do need to know a few symbols to be able to interpret tests, but I didn’t want to cover much in the way of formatting. I don’t care if students report a number as 0.02 or .02 or 0.0212; I don’t care if they write “p-value” or “p-value”. Time is limited, and I don’t want to waste their time or my time going over this nonsense. If by the end of the class, they know the concepts but not the formatting, then I have succeeded. If they know the formatting but not the concepts, I have definitely failed. So I decided to focus on the concepts and, as much as possible, ignore the formatting.

2.3 Fractal

So that’s what I wanted to teach. How do you actually teach something like this?

Most courses take a cumulative approach. You start with the basics, and the material slowly becomes more and more complex. Each lesson builds on all the previous lessons. At the end you finally tackle the most advanced material. Then you take the final.

In my experience, this falls apart by the second week of class. Students who miss even a single lecture are cut adrift, left to founder or drown. Even if you make it to every class, your safety isn’t guaranteed. If you don’t understand the explanation they give in lecture, you’re out of luck, because the class is never going to come back to that topic again.

Rather than being cumulative, my course approach was fractal. A fractal is a figure or function where every part has the same character as the whole. Every part contains copies of the whole thing. That’s how I structured the course: every part of the course was nested within other parts of the course.

A photo of my stats course from space. jk it’s fractal broccoli

You could be the best teacher who ever lived, with the most beautiful slides imaginable. It doesn’t matter — students just can’t learn something in one go. This is especially true in statistics. The classic learning pattern for the subject is brief flashes of insight, a feeling of sudden understanding, and then losing your hold on it and slipping back into confusion. This is normal.

For some reason, people don’t understand this. Everyone thinks there is going to be a shortcut explanation for these ideas, but we don’t think that way about other skills. We don’t think that painters will master three-point perspective in a single session, and we don’t expect programming students to master for loops in a single day. Maybe you can get the gist after the first introduction, but really understanding these topics takes time. Somehow we see stats differently. In particular, there is a whole genre of articles and blog posts all about how to explain p-values. These assume that the concept can be distilled into a single statement, or a single lesson. But that’s crazy. You can’t understand p-values in one hour, no matter how good the explanation is.

I think of statistics as really being three closely-related topics: a language for talking about data in general, descriptive statistics for talking about individual variables, and inferential statistics for making educated guesses about the world on the basis of limited samples.

The structure was built around these topics. The first day of class was an overview of the entire course, introducing all three topics in very general terms. Day 2 and Day 3 were another microcosm: again we covered the whole course, this time in slightly more detail.

Week 2 covered data in more detail. Week 3 covered descriptive statistics. Weeks 4 and 5 covered inferential statistics. Finally, in week 6, we went even deeper into inferential statistics, exposing exactly how the math behind the tests works.

This means that students see every single topic many times before the end of the course. For example, the two-sample t-test appears a total of six times in the lectures. It appears first in day one, during the complete overview, again in the lectures for day three, and then again in weeks three, four, and six.

It doesn’t matter if you don’t understand the two-sample t-test the first time, or the second time, or even the third time you see it. It doesn’t matter if you miss a few classes. It doesn’t matter if one of the examples I use doesn’t make sense to you. We will come back to this concept again, in a new context, with new examples. By the end of the class, you will get to see it from every angle.

These things take time. Mastery of a subject comes only when you return to an idea over and over, seeing it in new situations and becoming more familiar with it, building your own understanding. The structure of the class needs to support this, or students won’t be able to learn a damn thing.

2.4 Context

My influences in this were the Snowflake Method, and Progressive Rendering from It’s Time For An Intuition-First Calculus Course. Both of these perspectives emphasize understanding the gist of an idea before getting stuck in the details. To quote the reasoning from It’s Time For An Intuition-First Calculus Course:

The “start-to-finish” approach seems official. Orderly. Rigorous. And it doesn’t work.

What, exactly, do you know when you’ve seen the first 20% of a portrait in full resolution? A forehead? Do you even know the gender? The age? The teacher has forgotten that you’ve never seen the full picture and likely can’t appreciate that you’re even seeing a forehead!

Progressive rendering (blurry-to-sharp) gives a full overview, a rough approximation of what the expert sees, and gets you curious about more. After the overview, we start filling in the details. And because you have an idea of where you’re going, you’re excited to learn. What’s better: “Let’s download the next 10% of the forehead”, or “Let’s sharpen the picture”?

Let’s admit it: we forget the details of most classes. If we’ll have a hazy memory anyway, shouldn’t it be of the entire picture? That has the best shot of enticing us to sharpen the details later on.

Sometimes I think of this course as Intuition-First Statistics. “Intuition-first” doesn’t mean our goal is to teach good statistical intuitions, though hopefully students do get some of that. It means that we should start by working with intuitions, and that everything else will follow from that. Because, although it may sound surprising, students actually have pretty strong statistical intuitions.

The problem is context. The cumulative or start-to-finish approach makes perfect sense to the instructor, but only because they already know what is coming. They can see the context; how everything is connected.

The students don’t have any of that. They just get hit in the face with new material that they never saw coming. Every day it’s some new bullshit. They have no idea what is up next, what it means, or how it all is related. They’re always being knocked off-balance by new topics you didn’t prepare them for, and they never have time to figure out how it’s all connected.

Your Students
Your Students

This is a huge problem, because context really matters for comprehension and memory. A great example comes from research by Bransford & Johnson (1972). In their studies, participants heard a paragraph like the one below. Take a look at this passage and see if you can figure out what it is all about:

The procedure is actually quite simple. First you arrange things into different groups. Of course, one pile may be sufficient depending on how much there is to do. If you have to go somewhere else due to lack of facilities that is the next step, otherwise you are pretty well set. It is important not to overdo things. That is, it is better to do too few things at once than too many. In the short run this may not seem important but complications can easily arise. A mistake can be expensive as well. At first the whole procedure will seem complicated. Soon, however, it will become just another facet of life. It is difficult to foresee any end to the necessity for this task in the immediate future, but then one never can tell. After the procedure is completed one arranges the materials into different groups again. Then they can be put into their appropriate places. Eventually they will be used once more and the whole cycle will then have to be repeated. However, that is part of life.

One third of the participants heard the paragraph without any context. It didn’t make much sense to them, and they had trouble recalling what they had heard.

The next third of the participants, before hearing the paragraph, were told that it was about doing laundry. To these participants, the paragraph made perfect sense, and they had very little trouble recalling the details.

The final third learned the topic only after they’d heard the entire paragraph. These participants also found the paragraph confusing, and even having been given the context, weren’t able to recall much about it. Context alone isn’t enough; you need to see the context up front.

Something similar happens in class. Without context, even the most motivated students have trouble remembering the material. They have a hard time memorizing tests or equations because they don’t understand what a test is used for, let alone how it works. I don’t have trouble with the equations, but only because I understand what the tests were created to do. It’s easy to put things into their proper categories them if you have a good grasp of the system; it’s impossible if you don’t even know what categories there are.

The fractal approach solves this problem. The first two or three times I went over the material, I didn’t expect them to remember any of it. We cover all the material early on, because being introduced to everything at a shallow level prepares students to understand the material in depth once it comes back around again.

What You Want from Tests

[I originally wrote this around December 2019, and I’m reposting it here for reference.]

I used to be in favor of open-notes tests. But after seeing them in action for a while, I realized that I don’t think that they’re a very good idea.

It’s true that traditional tests don’t do a good job accomplishing what they are designed for. It’s good to see people exploring different ideas about what tests can be. But an open-notes approach doesn’t fit very well with the strengths of test taking.

Settling for this approach keeps tests from becoming all they can be. Tests have some natural strengths and some obvious weaknesses, but if we understand this, we can design tests that will help us do what we want. I say this as a person who went to a college that had no tests at all!


The traditional argument in favor of open-notes tests is that having access to your notes is more true to life. In the real world, the argument goes, you aren’t locked in a room with no resources and forced to answer questions under a time limit. You have access to whatever resources you need, and can look things up as you go.

Einstein famously was unable to remember the speed of sound, when given the Edison Test. Why memorize such facts, he remarked, when one could easily look them up in a textbook?

This perspective is entirely correct. Skill involves the use of more than just what one carries around in one’s head. An expert makes use of many tools and will refer to a variety of sources when solving a problem. In many ways, skill in a domain is skill at using the reference works of that domain. Hence the old joke that programming be renamed “Googling StackOverflow.”

Take this view too far, however, and you end up with absurdity. It’s clear that experts don’t carry around everything in their head. But it’s also not true that they carry around nothing in their head.

A physicist may not be able to tell you the speed of sound without looking it up. But every physicist will be able to tell you who Maxwell and Newton were, and a little bit about their contributions. If someone doesn’t know what F = ma means, they’re probably not a physicist.

A programmer won’t be able to recall from memory the exact workings of every function they’ve ever used. But every programmer will be able to tell you the syntax for writing a for loop in their favorite languages. If someone can’t tell you the syntax of an if statement, they’re probably not a programmer.

An expert is someone who is able to do both. Some things they will know by heart, and some things they will be able to accomplish only given time and resources. You need both to have mastery of a skill. We might call these two forms of knowledge what you carry around in your head and what you can accomplish.


We don’t expect students to leave a class as an expert in their field, but we do expect them to have mastery of the material.

What does mastery mean? I think that mastery involves both of these skills.

Someone who can accomplish a task but doesn’t carry any of that knowledge around with them is following a guide, or a set of instructions, without any understanding. Someone who can tell you important facts about a field but can’t accomplish anything is a fan, not an expert.

Students shouldn’t be expected to memorize everything. We should understand that they will do their best work when they can use their notes, look things up, and take time to consider multiple angles on a problem. But we should expect them to carry certain very important facts around in their head wherever they go.

I don’t care if a student leaves my statistics class without memorizing the equation for a t-test. But if they can’t explain what a p-value is, or can’t read a scatterplot, that’s a problem.

If we want to evaluate a student’s mastery of a class then, we want to measure both of these kinds of knowledge. We should give them the chance to demonstrate real skill in the field, but we should also require them to show that they have internalized some of the most important facts and concepts.

Luckily we already have good ways of doing both.

Tests isolate the student from their resources and have the potential to measure the information that the student actually carries around in their head.

Class projects and papers allow students to use whatever they want in the solving of an actual (if usually artificial) problem, and have the potential to measure the student’s ability to accomplish practical work in the field.

If tests and projects are designed with this in mind, the class can run smoothly. If they are not, the result is disaster.


What are the important features of a test? Well, they happen in a controlled environment. You can’t choose what you’re working on; the questions have been decided for you. You have a limited amount of time. You’re not allowed to collaborate with other people. And you’re not allowed to look anything up.

Open-notes tests relax this last criterion. Some of them relax it in a small way; often students are given a formula sheet, or are allowed to bring a page or a note card as a cheat sheet. Sometimes these tests are truly open notes, and students are allowed to refer to whatever they like. Sometimes students can even bring their laptops, and make use of the entire internet. [1]

Trying to evalate a student’s skill at solving problems without restrictions is good. Trying to do it with a test is bad.

Tests aren’t a good way to evaluate this kind of knowledge because they still unnaturally restrict the student in other ways. The student isn’t given the kind of time they would have if they were solving a real problem. They don’t get any choice of what problem to work on. They can’t collaborate with others, or go to peers to discuss some aspect of the problem that’s troubling them, something that is a huge part of solving problems in the real world. The format of a test hamstrings them.

This is especially tragic because tests are so naturally suited to evaluating the knowledge and skills that a student has internalized. Why not use the tests to see if the things you want them to carry around in their heads have actually ended up there?

When designing a test like this, you should figure out what you want your students to walk away with, and only include questions about those facts and skills. Anything that they would be better off just looking up (dates, exact values, trivia, etc.) shouldn’t appear on the test in the first place.

A simple way to evaluate this kind of test is to give it to your peers and to other experts, and make sure that they can answer all the questions easily without looking up the answers. If experts in the field can’t casually ace your test, then it isn’t a good test of what experts should be expected to carry around in their heads.

This standard may even be slightly too harsh; you probably don’t need your students to walk out of the class on the same level as an expert. Another way to figure out if a test like this is fair is to pick a student who you know reasonably well and seems to have mastered the subject, and see how they do on your test.

A test made on these principles should be simple and easy, something that an expert would be able to breeze through.

Projects & Papers

Depending on the subject, class projects or papers are the right way to test the other skill. Rather than shoehorning open notes into a test format, which doesn’t suit it, just have them do a project.

Projects are inherently open-notes; who ever heard of limiting the resources that can be brought to bear on a class project?

No course can really be like the real world, but giving students a facsimilie is a good idea. Projects provide a better environment for this because they don’t hamper the student unrealistically, as even the most liberal open-notes test will. Students have some level of control over what project they choose, how they approach it, what techniques they use, and who they call on for help.

Is this true for all classes? I don’t think so. Foreign language courses are all about internalization. If you need to look anything up, you haven’t really learned the language. Testing makes a lot of sense in a language course, but I’m not sure if there’s any place for projects, at least not at introductory levels. Once you get to a composition course in a foreign language, projects start making more sense again.

There may be other reasons to include projects in one of these courses. In this essay I’m talking about projects being used as a form of evaluation, but projects can be an important teaching tool as well. Having students complete a project as an alternative to readings or lecture is a good idea, but a different use case.

There are also probably some subjects where tests make no sense at all. For many hands-on skills, like writing or sculpture, you could conceivably make a test, but the real proof will be in creation.

Testing is a good way to examine internalized knowledge, but there are some kinds of internalized knowledge that aren’t easily measured by a test. Just how to hold your hammer and chisel, just what the dough looks like when it’s ready — these are things that an expert will have internalized, but which would be difficult to put on a test. So there are some kinds of internalized knowledge that are better measured by projects.

It seems like this is especially true for crafts, and for courses beyond the beginner level, as the student begins to pick up these hard-to-measure intutions.

Generally, the more advanced the course, the less of a role there is for testing. While every subject has a core base of knowledge that all experts will know by heart, specialists will internalize knowledge that sets them apart even from other specialists. People already seem to understand this at some level, and most advanced courses go light on the tests.


[1] Sky Zhang points out that in certain cases, formula sheets can make a lot of sense. A programmer may not remember the syntax for all the basic operations of the language they’re learning, and the professor shouldn’t care. Giving them a sheet that provides that syntax won’t help them if they don’t understand the concepts, but it is forgiving towards students who have deep conceptual understanding but can’t be bothered to remember the exact notation for every operation. We can trust that if they choose to continue, they will eventually know the basics by heart. I think this is another case where professors should think about what they really want students to get out of the course (in this case, the concepts) and what they could care less about (hopefully, the syntax).

Thanks to Amy Ludwin and Sky Zhang for reading drafts of this.