Cheating at School is a Better Idea Than Ever

With absolutely no apologies to The Wall Street Journal.

A year of absolute, unprecedented bullshit has spurred an eruption of cheating among students, from grade school to college. With many students isolated at home over the past year—and with the pointless grind of school revealed for what it truly is—academic dishonesty has never been such an obviously reasonable choice.

Some pedants fear the new generation of cheaters will be loath to stop even after the pandemic recedes. “Students have finally found a way to avoid my bullshit, and worse, they know it works,” said Phineas Whateley, senior teaching fellow in soup calculation at Royal University College in London, who has studied academic integrity issues for more than two decades, though apparently without learning anything of value. He said cheating sites number in the thousands, from individuals to large-scale operations.

Concerned about his West Carolina State University students cheating in a statistics class, Richard Penistone launched a plan.

Rather than writing a more reasonable exam, or spending time helping students master the material, Mr. Penistone, a course coordinator, wasted countless hours writing a computer program that generated a unique set of questions for each student. Those questions quickly showed up on a for-profit homework website that helped him to identify who posted them. 

About 200 students were caught cheating—one-fourth of the class. Yet somehow Mr. Penistone was more concerned about punishing these 19-year-olds than he was that he had created a class that 25% of his students decided was such bullshit that they couldn’t be arsed to even attempt the final exam. 

We note that Mr. Penistone is a course coordinator, not faculty. We assume he’s not tenured; he doesn’t even have an advanced degree. What is his deep-seated loyalty to this West Carolina State University, exactly? Do they pay him so especially well, that he is roused to stay up late writing code to generate and distribute 800 totally unique exams?

Overall, cases of academic dishonesty more than doubled in the 2019-20 academic year at WC State, with the biggest uptick as students were forced into the absurdity that is the Zoom classroom, according to the school.

Educators say stress and pressure, possibly related to the global pandemic maybe???, are a big reason why students cheat. “Especially in a time of stress, they realize that there are more important things than the rote memorization and regurgitation we force them to do on exams,” said Myra Capwell, president of the International Global Center for Academic Honor, Security, and Integrity, and director of the Kansas-Nebraska-Indiana Interstate University Dignified Honor and Integrity System.

Lucien Hoyt, an 18-year-old freshman at Mamimi University in Cambridge, Ohio, said he knows students who have used homework help sites for studying—and (brace yourself, dear reader) for cheating. He said he hasn’t cheated himself, but then again, he knows we’re narcs, so he would say that. 

He said students, including himself, are frustrated with virtual learning because it throws into stark relief how artificial these courses are, and how none of it matters. “I haven’t struggled this way with learning material, ever,” he said. “In the classroom I had the vague sense I might actually be getting an education. But with the trappings stripped away, it just becomes so clear that what they’re asking us to do is total busywork.”

At the K-12 level, schools are free to indulge their whimsy to become miniature police states, and many block a range of homework help websites from district computers to prevent cheating. Ultimately even this exercise in authoritarianism is pointless, however, since this doesn’t stop a student from visiting the site from a different device. 

Middle-school teacher Aurora Zimmer in Lake, Califorina, has put less emphasis on testing during online learning because it is also dawning on her, if somewhat slowly, that this is an exercise in futility. “We have no control of what is going on when you’re on a computer,” she said. “We can’t even force you to ask us to go to the bathroom. It really makes you think.”

Measures taken in the name of online cheating have spawned a new kind of comforting-and-not-at-all-draconian industry: surveillance-type companies that hire online randos to actually watch students take tests from home. I don’t know about you, but I find the idea of a faceless company hiring an online stranger to watch my 19-year-old child take a test in their bedroom very reassuring.

The internet strangers hired by these companies look for suspicious behavior (this is good because they are presumably experts in suspicious behavior), such as a student disappearing from camera view (going to the bathroom) or being slipped answers (eating chips). Some use “facial-detection” “software” to automatically penalize students who glance, however briefly, out of frame, or make “unusual movements”. This allows universities to not only be pedantic at heretofore unimaginable speeds, it allows them to outsource it as well.

Proctorio, based in Scottsdale, Arizona, said it monitored 21 million exams in 2020 world-wide, up from 6 million exams in 2019.

ProctorU, based in Hoover, Alaska, notes worrying displays of basic desires for respect, freedom, and privacy. “Some of these students must have accidentally been paying attention in their American History classes,” says one ProctorU drone with a sneer, “But we have a leg up on King George III. These latter-day George Washingtons and Patrick Henrys don’t stand a chance.” 

Here are some “funny stories” about cheating to make it sound amusing and disguise the human cost and egregious civil rights violations inherent to this kind of in-home surveillance: Some of the busts include a student suspected of trying to use a drone’s camera to take images of a test to possibly share with others; another who was trying to cheat by using information on sticky notes on his dog; and a female student who sneezed and disappeared from view, to suddenly be replaced by a male wearing a blond wig, impersonating her. Dogs and crossdressing! Isn’t that funny? Now go back to bed, America. 

Among the newer ways to cheat are homework auction sites, which give students a say in who does their work and at what price. Students post their assignment on a website, along with a deadline; the website acts as a marketplace for bidders who offer to do the assignment.

The bidders, who often refer to themselves as tutors, can tout degrees and other credentials. Some companies allow students to rate their work and post reviews online.

Stella Walker, a blogger and content strategist for, a site where students can auction out writing assignments, said the site’s terms prohibit academic fraud and plagiarism. She said she supposes cheating can happen, but it would be on a student’s conscience. “I know you’re a snitch, bonehead,” she told us.

One self-described independent tutor listed as Seymour Butz in a Craigslist ad said in an interview by text message that business was booming during the pandemic. The Craigslist ad noted services such as doing students’ math work. The tutor disavowed the label of a cheater for students, and said that the tutor helps students learn by providing written tutorials and explanations for math problems.

“No way would any student use my cheating service to avoid doing their work,” said Mr. Butz. “Boy do I ever disavow that label, you can put THAT in your article.”

Mr. Butz touted bachelor’s and master’s degrees from Pinto University in the ad, but the university said it was unable to locate such information in its records. We at The Wall Street Journal are beginning to suspect that “Seymour Butz” may be an alias

Other popular websites that students use to get help—by submitting a question for an expert to quickly answer, or by searching a database of previous answers—include Chegg and Brainly, which said they have seen a big increase in users during the pandemic.

Chegg, a publicly held company based in Santa Clara, Calif., prides itself on a willingness to be a big squealer, and help institutions determine the identities of those who cheat. “We really like to play both sides. It gives us a deep, almost visceral pleasure, to serve as a sort of giant honeypot sting operation for entrapping helpless students,” they told us by greeting card, despite the fact that we specifically did not ask them for comment, and don’t know how they found our home address. On the basis of such scummy practices, Chegg saw total net revenue of $644.3 million in 2020, a 57% increase year over year. Subscribers hit a record 6.6 million, up 67%, and students are charged between $9.95 and $19.95 per month for the privilege of letting Chegg stab them in the back.

Mr. Penistone at WC State said Chegg helped identify the 200 students that used its website to avoid taking his exhausting final exam. Some students posted exam questions to get answers while others accessed the information, all traceable through users’ email addresses, IP addresses and the time of the access.

Another website that students were suspected of using to cheat on the exam to a lesser extent showed actual moral fiber and didn’t cooperate with the university, Mr. Penistone said bitterly.

The students were given three options: meekly accept their punishment, join Mr. Penistone in what we can only imagine must be an excruciatingly awkward Zoom call to “review the evidence”, or dispute the accusation with the Office of Student Conduct. This office designed and staffed by the university and tasked with enforcing its rules is certain to give them all a fair hearing, we are sure.

“A lot of the students responsible said, ‘It’s unfair to put us through this, because we’re going through a pandemic,’ ” Mr. Penistone said. “Fortunately these complaints fell on deaf ears. I had no choice because there was a zero-tolerance policy. I mean, I’m the one who designed the class, and the exams, and the zero-tolerance policy. But really, I had no choice.”

Even after the bust, the cheating didn’t stop. This is unsurprising, because the issue is not cheating, but unrealistic expectations in texts and exams. A close analogy might be, “Even after the floggings, the attempts to mutiny didn’t stop.” I wonder why.

“In the fall semester, of 1,000 students, I still had attitude problems academic integrity issues with 70 or 80,” he said. “I still don’t understand the basic issues at play here — probably they have just gotten better at cheating, but fortunately I am blissfully unaware of all things happening in my classroom.”

The real tragedy of course is how this all contributes to greater societal alienation — that cheating is now being outsourced to faceless corporations, rather than being a way to build community with fellow classmates. What kind of America are we building for our children?

Hindsight is Stats 2020, Part III: Final-First Exams

[This is Part III of a retrospective on teaching statistics over summer 2020. Part I and Part II.]

Exams were my white whale for this course.

My design goals were clear. Someone who knows their stuff should be able to prove what they know and walk out of the class. Students should be encouraged to learn as fast as they can, and they should be rewarded for getting ahead of the class if they want to. And there should be almost no consequences for failure, so that students can experiment without torpedoing their grade.

But exams are famously plagued with problems. Rescheduling exams for students who are sick or have to miss a day. Deciding who gets to do make-up exams. The endless questions about exam format — “professor, will this be on the final?” Somehow, we complain about all this but take it for granted. Why not come up with a way to make these problems a thing of the past?

1. Final-First Exams

These days, professors have gotten more comfortable experimenting with exam formats. Lots of exams are open notes, open book, or even take-home. Some classes let you drop your lowest exam score. I’ve even heard of professors giving five exams and dropping your worst two.

Dropping tests is cool, because it fixes some of the classic problems. Have to miss an exam? No problem, just drop that one. No need for make-up exams. If you bomb an exam, just drop it.

This is the right direction, but we can do better. What else can we tinker with, to make exams even better?

I thought back to the cumulative format, and why it doesn’t work for teaching. Why have cumulative exams, then? Doesn’t it just serve to obscure your expectations? My class format was fractal, so that students could see what’s coming, know what’s expected of them. Why not use this approach with exams, too?

Dropping one exam isn’t cool. You know what’s cool? Dropping ALL the exams.

I call the format Final-First, because your first exam is a final exam. In fact, every exam is a final exam, meaning every exam covers all of the material covered in the whole course. The exams have nearly identical formats, differing only in the particulars. I swap out the numbers and some of the details on the questions, but once you’ve seen one final, you have a pretty good sense of all of them.

This course was six weeks long, and I gave them a final exam at the end of every week. This means they had a final exam at the end of Week 1, at the end of Week 2, at the end of Week 3, and so on…

Since these were all final exams, I didn’t expect most of them would do very well on the first exam. But that’s ok, because we dropped all their exam scores except for the best one. The exam grade, as it contributed to their grade for the class as a whole, was entirely based on their best exam. Other exam grades didn’t contribute at all.

If a student gets a 90% on the third final, it doesn’t matter how they did on the first two. Why should a student suffer if they get a 10% on the first exam but manage to nail it with a 90% later on? Clearly that student has done a great job and learned all the material we wanted them to, even though they struggled at first. In fact, isn’t that more impressive?

This format has some great features, which are beautifully in line with my design goals:

  • Good Incentives: If you understand the material quickly, you should be rewarded. Students who succeed are rewarded with more freedom. No one who has mastered the material should be forced to go through the motions. If you get a grade you’re happy with, you can choose to skip the rest of the exams with no downside.
  • Safety Net: Each exam offers a new chance to set a minimum threshold for your grade. Once you get a 85 on one exam, you can rest easy that your grade won’t go any lower. With this design there are no consequences for failure. You can bomb (or miss) as many exams as you want without any risk to your final grade.
  • Low Anxiety: Students who are able to get a good grade on one of the early exams will be able to worry about things other than cramming for the next exam. Maybe they’ll use it to study more, or maybe they’ll just go to the beach. I don’t care. If you can get an 80 on the final exam in week two of a six-week class, you deserve to go to the beach.
  • Transparency: With this format, there’s no more need for, “what will be on the test?” Once you have taken the first final, you will know (approximately) the format of all the other finals. This has the added benefit of:
  • Context: Seeing all the material at once will allow you to begin building a tapestry of ideas in your head. You will never be blindsided by new material, things you didn’t realize were expected of you. Once you’ve seen one final exam, you’ve seen them all, and being exposed to all the material early on will help you learn it better.
  • Feedback: You will be able to tell what skills you have mastered and which you need to work on. This will allow you to spend your study time wisely. Previous exams become a great tool for review. You can go over your performance with the TA or professor and be able to see exactly what you need to work on for the next exam, because the next exam is so similar.

I was really happy with this design. It hit all of my design goals, and it resolves a lot of the classic problems with exams.

Other people liked the idea too. I was on a date with a PhD student and we were talking about teaching, so I told her about this design. She said, “that sounds a bit insane upfront, but not so much when you think about it.”

Now there was nothing to do but try it out. For this class, I made the exam 50% of the final grade. Normally, making a single evaluation a huge chunk of the grade is unfair. But with this format, the exams are the best one of six evaluations, and besides, the exams test what I really want them to know.

1.1 The Results

Final-First exams worked really, really well.

I was worried that students would be confused by the format, or would be terrified when they failed the first Exam, but I actually got very few questions about it. Students seemed to understand what I was trying.

It really did solve all the usual exam problems. No one ever asked me for a makeup exam. Only once did I have to clarify what would be on the exam. When students wanted to meet to go over their answers, we were able to make real progress, because it was immediately clear to me what parts of the material they had mastered and what they were still struggling with. In many cases we could look back over two or three different exams and see the same thing tripping them up every time over multiple weeks.

Most people improved steadily over time. The average grade went from 60% on Exam 1 (this was by design; see below) to 85% on Exam 6. Students took the exams pretty freely. Some of them took every exam, but on average they took only 4 of the 6 exams.

A few students actually got their best grade quite early on. On the first final, at the end of the first week of class, the highest grade was an incredible 88% (!!!). This student kept taking exams, though, and was able to eventually beat her record with a 92.5% on Exam 5.

The student who got the second-highest score on Exam 1 got a 84%, again very high for having taken only three classes. This student chose to skip most of the other exams. He did take Exam 5, but only got a 75.5%, so in the end his final grade was actually based on his exam score from the first week of class!

I was a little surprised that more students didn’t try to get a great grade early on. When I think about this format, one of the most exciting things to me is the idea that you can teach yourself all the material, get ahead of the class, get a great exam grade halfway through, and not have to show up to class anymore. But while a few students got great scores on Exams 3 and 4, that was the exception. It might be different in a semester-long class. Six weeks is just not much time to teach yourself, even if you really commit to it!

These are extreme cases of the safety net working as intended, but the design worked equally well for students with less extreme grades. To my surprise, only 26 of the 39 students took Exam 6, the final final exam. I think this means that by the end of the class, many of them were satisfied enough with their exam grade that they chose not to take this last final. Of those who did take Exam 6, only 18 got a better grade on the final final than on any previous final, which means that 8 people didn’t improve their grade at all on the final final.

The best exam grade in the entire course, a 97.5%, was actually earned on Exam 5. Perhaps unsurprisingly, that student chose not to take Exam 6.

These grades are really impressive, because the exams were not easy. I came in with specific expectations of what a student should know by the end of intro stats. These expectations were reasonable, but they were also pretty high. We expect too little of undergrads, and we underestimate what they are capable of doing and understanding.

I didn’t change my expectations at all during this course. Every student who earned a 90% on an exam met my expectations, and every student who did better than that exceeded my expectations. In my opinion, a good grade means that they mastered the material.

1.2 Student Opinion

Students really liked the exams. Some of the most positive feedback was about this part of the class. Take a look:

“This was one of my favorite aspects of the course because it genuinely did relieve a lot of stress. My biggest fears for this course revolved around completing it and not only doing poorly, but also learning nothing. I think the weekly exams allowed me to continually refresh and apply what we had reviewed without the anxiety of failing the course.”

“I thought the idea of getting graded based on the best exam was exceptional since we learn more as we continue taking the class.”

“To be honest, this is the best [exam] format I’ve ever taken! It really gives me the motivation to study harder each time without getting too stressed out.”

Other comments were much the same. As you’ll notice, the experience students had with the format was exactly the experience I was aiming for. A few other notes of interest were:

“I found myself studying ahead of time to supplement the material I have not learned yet”

“Towards the end it was fine, but the first few were pretty stressful for me.”

The one complaint, which I did see a few times, was that the Exams tested them on questions they didn’t recognize and hadn’t seen before. But of course, this was by design, because I wanted to see if they really understood the concepts.

Some students seemed to understand this, with one noting, “[Ethan] helped us prepare as best as we could without actually giving us the answers.” And once again I’ll point to their excellent exam grades as proof that the difference in format wasn’t actually a problem.

2. Exam Design

This format is certainly the most interesting part of the exams. But the design of the exams and the exam questions is worth discussing as well.

The Final-First exam format doesn’t work if you don’t pay close attention to the design of the exams. Exams need to be nearly identical, so that students always know what’s coming on the next one. But they can’t be too similar, or else students will memorize them by rote. You need to keep mixing it up.

I had a plan for the exams going in. As I argued in What You Want from Tests, exams should be used to test the knowledge that students carry around in their heads, the bits that an expert will internalize. That’s what I was aiming for in this class. Research reports would cover their ability to actually do stats, and exams would cover their memory and intuition for the most important concepts.

Then, of course, the whole course was forced online. Immediately I knew that this meant that exams would de facto be open book, open notes, and really, open Google. So I knew that I would have to pivot away from my original plans. I couldn’t just focus on internalized knowledge.

(I never explicitly told students that the exams were open notes, but I never told them not to look things up either.)

I actually think this ended up improving the exams. I stand by what I said in What You Want from Tests, but it can be more complicated than I imply in that essay.

2.1 Exam Structure

The structure of the exams mirrored the structure of the course — after all, every exam was a final. Each exam was 50 points in total. Of that, 15 points had to do with basic data skills, 15 points went to descriptive statistics, and 15 points were on the use and interpretation of inferential statistics. Just like the course, the exams were divided into these three sub-topics.

The remaining 5 points went to what I called “advanced topics”. These were questions about things we mentioned in lecture but were slightly outside the scope of the class, more complex questions about the use of core concepts, or questions that tested their intuitions in ways that we had hinted at, but hadn’t explicitly discussed.

An interesting feature of this is that a student who mastered all the core material, but hadn’t yet achieved that deeper understanding, would only get a 90% on the exam, because the advanced section was the last 10% of the exam grade. A grade of higher than 90% means that a student understood not only all of the material at the expected level, but was making progress into understanding it more completely.

This is why I am so confident that the students who got above a 90% on their exam grade not only met my standards, they exceeded them. That last ten percent came from questions that were, by design, more difficult than an intro stats student should be able to answer.

2.2 Exam Difficulty

Maybe other teachers already know this, but something I had never realized before was that a teacher has a lot of control over the difficulty curve of an exam. I knew that a professor could make an exam more or less difficult, but I didn’t understand that you have a lot of control over the distribution of scores.

This was particularly important for a class using the Final-First exam format. In this system, most students take a final exam in Week 1, and of course most of them will bomb it. There’s a big difference in morale, however, between bombing an exam with 50% and bombing it with 5%!

I wanted to encourage students to do well. I wanted to make sure they felt like they could succeed from the very beginning. To make this happen, I designed the exam so that it was easy to get a decent score, but hard to get a great score. (For those of you who are statistically inclined, compare item response theory.)

(This is also how I asked Liz to grade the research reports. Make it easy to get a decent grade but hard to get a perfect grade, I said.)

I had already decided that 15 points, or 30% of the exam, was devoted to data skills. This stuff is pretty easy, and so I knew that most students would be getting a good chunk of points from this section right from the start. In the other two sections, I made sure to include a couple easy questions, to keep the baseline grade relatively high.

The fact that the average score on Exam 1 was 60% shows that I was successful. In fact, even in Week 1, the lowest exam grade was a 40%. That doesn’t sound like much, but considering that we were only 17% of the way through the class, I think it’s pretty good.

I used some other tricks for this as well. One was that the exam was almost entirely multiple-choice. A classic problem with multiple choice questions is that students always have a decent chance to get the right answer by just guessing. For example, a student guessing on a multiple-choice question with four answers will get the right answer 25% of the time. An exam with nothing but 4-answer multiple choice questions has a baseline grade of 25%. It’s even worse for an exam that’s all true/false, which has a baseline of 50%. This is why up until 2016, the SAT took off 1/4 a point for each wrong answer. Statistically, it meant that a student who did nothing but guess would get a score of about zero.

But we can turn this same force to our advantage. To adjust the baseline score, I can change the number of answers I include for my multiple choice questions. This is exactly what I did. For the Data section, which I wanted to be a score-booster, all the multiple choice questions had only a few answers each. For the Advanced section, where I wanted students to earn points only if they really knew their stuff, most of the multiple choice questions had 8 or more response options! And for the other sections, which I wanted to land somewhere in between, I included a mix.

Of course, there are limits to how lenient we want to be. In particular, true/false questions seem too easy — a baseline of 50% just from guessing is way too high. One idea that I really like is True / False / Can’t Tell questions. At a shallow level, these are just true/false questions with three options instead of two. But at a deeper level, this encourages students to engage with the question in a new way. Instead of just determining which answer is right, they have to think about whether they even have enough information to make that call. It literally adds another dimension to the question. This is especially well-suited to statistics, which is all about making informed guesses based on limited information.

I used a similar approach in some of my short answer questions. I’ve noticed that in class, students are often much more comfortable telling you why something is wrong than trying to give you the right answer themselves. I translated this into “What’s wrong with…” questions. Students would be given a short paragraph that described some statistics. In each case I had inserted an error into the paragraph. For example, sometimes I would say that a variable wasn’t skewed, but I would report a mean and median that were strikingly different. Students would have to pick out the mistake and tell me why it was wrong.

This is a really important skill in real life. A big part of the practice of using stats as a scientist is noticing when something is wrong in an analysis, whether you’re checking your own analysis or looking over someone else’s work.

I included one of these questions in the Data section for almost every exam, since they are a good way to ask about data features like skew and range without just asking students to regurgitate the definitions. I also included a few in the Descriptive Statistics sections, and I think that added some nice variety. You know a student doesn’t understand correlation when you report r = 1.2 and they don’t catch it.

I realize now that I never included any of these questions about inferential statistics. This was a mistake, since catching errors in the reporting of tests is something that comes up all the time. If I taught this class again, I would put “What’s wrong with…” questions in all three sections of the exam.

Another way to control exam difficulty is with paired questions. You include two questions about the same topic, but one is easy, and one is harder. For example, in my descriptive statistics sections, I always included two questions where I described some data and asked students what plot or chart they should use to represent that data. By design, the first of these was always pretty easy, and the second was, while not exactly hard, a more sincere test of their understanding.

This has some great features. First, it helps raise their baseline score. A student who understands the idea even a little will usually get the first question right, and this will boost their grade. They essentially get partial credit on that concept, even though the question is multiple choice. (They say you can’t give partial credit on multiple choice questions, but what do they know?) But a student only gets full credit if they can answer the more challenging question. Again we see that the design makes it easy to get a decent grade, but hard to get a perfect grade.

Second, it helps with feedback. For any topic on the exam, if a student gets neither question right, they clearly do not understand the topic at all. If they get the easy one right but not the harder one, they understand the basics but haven’t quite got the whole idea. And if they get both right, it’s clear they understand it at the level I want them to. If they somehow get the hard question right and the easy question wrong, this tells you that they were probably guessing. You can look at the exam and see exactly how students are doing with each of the core skills.

2.3 Difficulty Over the Course of the Class

As important as the difficulty curve within an exam is, it’s also worth mentioning difficulty curves over time. Part of the reason to make an exam easy to pass but hard to ace is that this is good for student morale, while still being an accurate measure of their ability. With a Final-First exam, you also want to worry about difficulty over time.

Students shouldn’t get a good grade on the first final unless they really know their stuff. Early on, exam grades should be pretty low. But if exam grades go down with every exam, or even if they fail to go up, that’s bad for morale. It tells the students that they aren’t learning anything from the class. That shouldn’t be true, and even if it is, you shouldn’t be telling them that!

My recommendation is that your hardest exam should go first, and your easiest exam (still staying true to what you want them to get out of the class) should go last, with the other exams in order of difficulty in between. And of course, for the reasons described above, your hardest exam should still be designed so that on average students do decently on it. If the average score on the first final is less than 50%, you’ve probably done something wrong.

One thing that I would like to do someday is create a way to generate exams automatically. These exams are formulaic by design, so it would be relatively easy to write a script that would mix & match components and spit out as many exams as you want. Not only could this make the exams more fair and regular, you could do things like share multiple practice exams with your students.

3. Exams Online

As with everything else, I was worried about exams being online. There were the concerns around cheating, as I mentioned above, and also just around giving an exam remotely.

I was wrong. Holding exams online is one of the best things I’ve ever done for a class. It was so easy that I am seriously considering using online exams for in-person classes in the future.

I ended up running all my exams through Qualtrics, a survey software I use in my research. Qualtrics is flexible and it has a lot of nice features that are helpful for exams, but I suspect you could run online exams with other survey platforms.

Exams were run every week. Since my students were located all around the world, and since many of them had jobs or other responsibilities, I opened the exam for a full 24 hours. Lectures were Monday / Tuesday / Wednesday, and every week the exam was open from 5:00pm EST Thursday to 5:00pm EST Friday. Using the survey software, it was easy to have it open all day and let them drop in whenever they wanted. I also liked how this didn’t cut into class time.

Qualtrics automatically records the time when a session is opened and when it is submitted, so I used that to time their exams. The exam would begin as soon as a student clicked on the link, since that prompted Qualtrics to record the session start. I recommended that they time themselves to ensure that they didn’t go over. We compared their start and their submit times to see if they followed directions. Some of them did go over by a little, but we were lenient, and graded those exams too. To my surprise, no one tried to sneak in a much longer exam session.

After some pilot testing with my sister, I ended up making the exam only 45 minutes long. This isn’t much time, but I figured it would be easy to add time later if I had to. I was worried that students would complain, and fully expected that I would have to bump it up to 60 minutes after the first few exams. But this ended up being unfounded too. I didn’t get any complaints about the exam length — students never mentioned it! — and so I kept it 45 minutes long for the whole course.

Short exams also fit my design goals. There’s no need to belabor an examination. As long as it’s accurate, it should be as short as possible. Once again, I imagined how it would be if, through some horrible clerical error, I was forced to take the class myself. I knew I would be able to ace the exam in about 15 minutes, so I wouldn’t be forced to waste more than a tiny amount of time. That’s how it should be.

Running exams online also gave us huge benefits on the backend. Exams were incredibly simple to grade. Once all the scores were in, I would take the exam myself, putting in all the right answers and writing ANSWER KEY in the name field at the end. Then, when Liz downloaded all the responses for grading, she could just use Excel functions to compare each of their answers to the responses I put for the answer key, and automatically assign points that way. There were always a few short-answer questions to grade by hand, but the majority of the grading, for every single student, could be accomplished in just a few minutes.

And unlike working with scantron or paper forms, there is no headache when it comes to digitizing the results. Answers and scores were in a spreadsheet from the beginning.

It was easy to make answer keys for the same reason. Admittedly I didn’t know this at first — all the credit goes to Liz. It turns out that you can make Qualtrics generate a PDF of all the answers given by a specific person, so all we had to do was get it to spit out the ANSWER KEY responses and, surprise, there was the answer key. Again your mileage may vary, but online systems can be very powerful.

The online format does offer students the opportunity to cheat. But as I already mentioned, I don’t think they did, and I don’t think it would matter either way. There are things you could do to help prevent this, if you were worried, like giving a narrower exam window or putting out multiple versions of the exam to prevent crosstalk, the sorts of things we already do in the classroom. You could make projects a bigger part of their grade. But I think it’s to everyone’s advantage to trust the students.

With a well-designed exam, it will be easier to learn the material than it will be to cheat. The same goes for open notes. If you make a good exam, it will actually be quicker for students to leave their notes closed.

5. What I Didn’t Get To

I got to put almost everything I wanted to in this course, but there were a few things I missed.

I’ve always wanted there to be a bigger role for teams, but the teams in this class didn’t work very well. It seems like there should be ways to encourage students to help one another out, reward them for working together. But all the ideas that come to mind, like giving students bonus points for helping their teammates, have obvious problems. So while I want to incentivize teamwork and peer support, I haven’t come up with a way to make it happen yet.

Students would also really benefit from giving and watching presentations. I was able to do this for my RA, and it’s clear to me that she gained a lot from making the presentations and from getting feedback. Criticizing presentations and giving feedback is also good practice for statistical literacy, and it might be less intimidating for the average student.

But it would be difficult to have every student give a presentation. It’s probably impossible for large class sizes, and it doesn’t seem like it would work well online. During the semester, you might be able to do it in recitation, either for extra credit, or in small teams.

But the real problem is that giving a single presentation is like answering a single math problem. It’s just not that much practice. Unless the class size were very small, you probably couldn’t set it up so that every student got to present multiple times. This might be better suited to an advanced course. The breakout room activities, given that they include small and regular “presentations”, might be the best we can do here.

6. Concluding Remarks

I’ve heard a lot about the things you can and can’t do when teaching stats. I’ve heard that you can’t get students to pay attention. That you can’t make them care about the subject. That they’re all cheating on their assignments. That they aren’t smart enough to learn how to use statistical software on their own.

Things are bad in education today, but they’re not bad because of lack of funding, or because students are unmotivated. Things are bad because educators lack vision.

What else do you call it when everyone knows what the problems are, but no one manages to dream up solutions? We have the ability to make education work for us, and nothing special is required, just careful thought and patient experimentation.

In particular, there are huge gains to be had in developing approaches that let students and teachers stress less over the material and waste less time. This may free them to spend more time learning, but it may also free them to have a life outside the classroom. A class with more hours of homework, longer tests, and more fiendish questions is not a better class. In most cases it is a worse one.

What could be better than learning more, with less effort, and in less time? Let us celebrate academic laziness. Perfection comes not when there are no more assignments to add, but when there are no more assignments to take away.

Students have almost no control, of course, but it’s confusing how teachers continue to design classes with backbreaking grading loads for themselves. Just give fewer assignments, shorter assignments, assignments that are easier to grade. You can do this without making your class worse. In fact, you can do it while making your class better.

So many teachers teach classes that they themselves would hate. If you wouldn’t want to take your class, if you wouldn’t find it easy, then what are you doing? It seems unnecessarily cruel to me. Make your classes enjoyable. If you can’t make them enjoyable, at least make them easy. If you can’t make them easy, at least make sure they’re not a huge pain.

So many teachers are paranoid about students cheating, collaborating, or doing too well on tests. Are you a teacher, or a mall cop? When classes are fair, students don’t cheat. Even when classes are rigged, most students still refuse to cheat. Taking this approach creates a system where the most honest students are the ones who have the most to lose. I have seen too many honest students fail what should have been an easy class.

It’s August as I’m writing this, and online I have seen many examples of college professors sharing heavy-handed “how to be ok pages” or “COVID pages” that they plan to attach to their syllabi for the fall semester. These pages contain assurances that you can come to the professor with anything, that you can get extra time when you need it, and so on. Professors love these pages because it makes them feel like they’re doing something to make a difference. But these promises are hot air and all your students know it. If the structure of your class is cruel, this kind of statement becomes a sick joke. And if the structure of your class is kind, then you don’t need a page at the front of your syllabus trumpeting it. It’s the fundamental rule of communication: show, don’t tell. Put your good intentions in the structure of your class or not at all.

Just make a class that doesn’t suck.

Hindsight is Stats 2020, Part II: Design Goals & Grades

[This is Part II of a retrospective on teaching statistics over summer 2020. Part I is here.]

Grades are stupid. But at the end of the day, my university forces me to give everyone a final grade. And you do want to evaluate your students based on something, so they can know what they mastered and how they can still improve.

1. Design Goals

To begin with, I tried to work out my design goals. I started by thinking about the ways that classes normally fail and decided to work backwards from there.

One of the most blatant failures in the education system is when students are forced to take a class that they’ve already taken, or on a subject they already know. So my first goal was that someone who really knows the topic should be able to get a 100 with very little effort. There’s an easy way to check if this works: the course should be designed so that if, as the professor, I were to take it, I would ace it easily.

And not just ace it. Someone who really knows the material should, after demonstrating their knowledge, be able to walk out of the course entirely and never have to come back. Once you know the material, you shouldn’t be forced to waste your time regurgitating it.

A related problem is forcing students to waste time on concepts they already understand; or, conversely, moving on to new material before a student is ready. This is tricky because students really do learn things at different speeds. We can’t tailor the lectures to every student, but we can do things to help. Students should be given freedom to focus on the problems they find challenging. Once a student has mastered something, we should try not to bother them about it.

Similarly, most classes don’t incentivize students to learn things on their own. There’s no point getting ahead of the rest of the class. You’ll just be bored, and it might even hurt you, since it will be taking away from the time you could be using to cram the old material. This is a perverse incentive. If a student is ready to go further on their own, we should let them.

Basically, if a student wants to speedrun my class, who am I to complain? Let them do it.

Another classic way that classes screw up is by making students afraid of failure. With traditional grading, students have no room to experiment with different ways of learning, understanding, and studying. The class format requires them to obsess about every evaluation, and encourages them to do the minimum amount required to get the grade, to take no risks. If they try something interesting and fail, their GPA plummets. This leads students to obsess over pointless minutiae like what precisely is on the test and exactly how to word their answers.

I wanted to save them the time they spend thinking about this nonsense. If they choose to spend that saved time studying, so much the better. If they don’t, then all we are losing is their anxiety. Either way, we should reward students for taking risks and attempting to go deeper with the material, not punish them.

In the end I came up with three ways to evaluate student progress.

First, I had a system to replace class participation and attendance, based off of small team activities, which counted for 30% of the final grade.

Second, I had students independently analyze two simple datasets of their choice, and write up a report about each. Together the two reports counted for 20% of the final grade.

Third, I invented a new exam format (covered in the next post), which counted for 50% of the final grade.

2. Teams & Breakout Rooms

I really hate attendance.

Taking attendance is undignified. It’s disrespectful of students, who are assumed to be incapable of making informed decisions about their education, and of the professor, who is implicitly supporting that assumption. If students are sick, have a family emergency, or need to go to the dentist, they should be able to do so without worrying about their grade. They shouldn’t have to send me an email with a doctor’s note. I don’t like getting those emails—just stay home if you’re sick—and I’m sure students don’t like sending them.

All of this is doubly true of online teaching. All the lectures are recorded. Students can watch and re-watch my presentations as many times as they want. Why should any of us care about them being “in class” when that means almost nothing in a virtual classroom?

When I taught Introduction to Psychology last summer, I tried using a participation-based system. Rather than taking attendance, I had my TA mark down when students spoke in class. The idea was that this would encourage them not just to show up, but to participate in class discussions. I also hoped it would encourage them to do the assigned reading, which we discussed each day.

This didn’t work. Students would speak up even when they had nothing to add, just to get the grade. The quality of discussion suffered for it. Some very shy students didn’t speak at all, and lost points despite the fact that they were doing great in the class otherwise. It was a huge pain for my TA to keep track of it all. This system didn’t do anything I hoped it would, and I think it was a failure.

We could just chuck attendance altogether. But on the other hand, it’s good to have some kind of incentive for students to show up to class. Recorded lectures are about as good as live ones, but if students show up to class most of the time, they can ask questions and I can get a sense of what they do and don’t understand. It would be good to encourage most of them to be there most of the time. Can we come up with a way to make this happen?

2.1 Enter the Zoom Room

One of the things that everyone learned early on in the pandemic is that video calls suck. Jumping onto a Zoom call is excruciating, and afterwards you feel drained of all will to live. Turning off your camera helps, but not by much.

At first this seemed universal. People speculated that it was something inherent to the Zoom platform. There were theories that even subtle video latency was unnatural and jarring. But over time, I noticed two exceptions. The first was direct calls, with smaller groups. Hanging out with one or two friends over Zoom, while not as much fun as hanging out in person, didn’t make me want to tear my eyes out the way a Zoom call with several people did.

The other exception was playing virtual trivia. Early on in the pandemic, my friend Liz from my PhD cohort set up a virtual trivia night for students in our program. In virtual trivia, we would all gather in one Zoom room to start off. For each round, teams would be sent off into individual breakout rooms for 10-15 minutes to answer questions. Then we would all come back to the main room for scoring. We’d do this process for each round, with a couple of trivia rounds each night.

This was infinitely better than every other group call I had been on, and it wasn’t just that we were a group of PhD students drinking late at night. The breakout rooms were just as relaxed as being on a small call, and they broke up the evening in a way that made the main room much more fun, even though the full group was pretty large.

When I started thinking about how to run an online class, I knew I would have to include something like this.

(Liz also happened to be my TA for the stats course!)

I had been wanting to incorporate something about teams for a while, and this seemed like the perfect way to do it. Instead of sending teams off for rounds of trivia, I would send them off to do breakout room activities, and call them back to discuss the answers.

These activities took different formats depending on the topic we were covering each day, but most of them worked something like this. I put up a question or a task on the slides, and then sent the students into breakout rooms for about 10 or 15 minutes. When they came back, I randomly chose a couple teams to share their answers.

Getting the correct answer wasn’t the point. If the group provided an answer that seriously engaged with the activity, the group got credit for that activity, even if their answer was incorrect. The only way to get no credit was to not engage with the question or to give no answer at all. If I didn’t call on a team, that activity didn’t affect their grade.

This seemed to be the perfect replacement for attendance. At least one member of every group would need to be there every day, while individual members could come and go if they needed to. But part of their individual success would come from helping to make sure that the whole team was successful, so it was still in their interest to show up and help out whenever possible. I didn’t need to keep track of who was there, I just needed to give activities and ask them for their answers. And I didn’t even need to grade their responses, just record if they made an attempt.

I also hoped that this would give them some level of social support for the class — the kind of friendship they would normally get from the students sitting next to them, and people to go to if they needed help or support.

Another benefit was that this broke up the huge lectures into smaller chunks. Intermissions had already broken the 2.75-hour classes into two sessions of about 1 hour 15 minutes. With breakout room activities, days could end up being four sessions of about 30 minutes each, with activities and an intermission in between. That’s a lot better.

This was also meant to be a grade boost. A whopping 30% of their final grade came from their team grade, and because all you had to do was show up and try to answer the questions, I expected most teams to get 100%. I included this grade boost because I didn’t want them to worry about their final grade too much. This way, they would still have to work to get an excellent grade, but a student who did a decent job wouldn’t have to worry about failure. (As I mentioned earlier, I think that grades are kind of a joke.)

I shared a brief stats experience survey with my students the week before class, and I assigned them to teams based on their responses. I wanted to make sure that each team had a diverse collection of skills — that there was at least one student in every group who was comfortable with public speaking, at least one with decent math skills, and so on. The idea was that every team would have the skills they needed to succeed, and they would all have someone to turn to for help on any subject. I ended up with eight teams of five students each.

2.2 How did Breakout Rooms Work?

The grading worked just as planned. Seven of the eight teams got perfect marks on their breakout room activities. The other group missed one day (none of them showed up) and got about 90% on the team grade. But in general this provided exactly the padding I intended.

Or, almost. In retrospect, 30% was way too much. Students got really good grades anyways, and it wasn’t all thanks to the team grade — remember, more than 50% got an A! Making the team grades only 20% or even only 10% wouldn’t have changed their grades by very much, because they were all doing so well on other parts of the class. Mostly, I think it should have counted for less than 30% because it’s a shame that so much of their grade came from something unrelated to their understanding of the material. I am very happy so many of them got a 95 — I just think it would be better for them to get a 95 from nailing the assignments and exams than showing up and participating! It’s something I would do differently next time.

The activities worked really well. Lectures can be, let’s face it, pretty boring, and I think having these class exercises helped keep students from falling asleep. There’s also no better way to learn something than doing it yourself, and so following each lesson with an exercise was a good idea. And it was nice on my end to take a quick break, wait a few minutes, and see how they had done when they came back.

You do have to be careful with the activities, though. Activities work well if they are a simple problem, something the students couldn’t do when they signed on, but can do now that they’ve seen the day’s lecture. This helps the lesson stick in memory, and demonstrates why what they just learned is actually useful. Activities can also take a “don’t take my word for it, see for yourself” approach, and I liked this when I was able to use it.

No matter what though, the activities have to be easy. They aren’t a challenge or an exam; they exist to round out the lecture and serve as a teaching aid. It’s ok if students struggle with the details; it can be good for them to get a sense of their own limitations. But if they get stuck, can’t do the activity, or reach a dead end, then they don’t learn anything. The implicit message is that they can’t handle it, and that’s not the right message to send them. They can handle things that you’ve prepared them for; don’t give them assignments you haven’t prepared them for.

Students had mixed opinions of the teams. I got feedback like, “there was zero accountability for the breakout rooms … Most of the time, my teammates wouldn’t show up” and “as the days progressed, my group became unresponsive to the point where I was simply doing the work and presenting it on my own.” A few of them did have positive things to say about the teams, but clearly that was the minority opinion.

Most students liked the breakout room activities, though. “I was able to apply the material and then receive feedback (if called on) instantly. The breakout rooms presented a great opportunity to work through what was being discussed,” one student said. Another wrote, “Breakout rooms really allowed me to understand the application of concepts. I don’t think I would have been able to work through the research reports (or the finals) with as much ease had we not gone through related work individually and then as a class.”

The only complaint I saw about these activities was that I gave students too much time to work on them. I find this confusing, because I assumed students would be happy to have an extra 5-minute break to go and make a sandwich or something. Either way, I mark this idea as another success. It does seem like it helped the concepts and skills really stick with them.

Some students suggested that the activities be designed to more directly prepare them for the exams — basically, to have the activities be examples of the kind of questions that appeared on the exams. I can see why they proposed this, but I don’t like it. The exams are designed to try to see if students can generalize stats concepts to new situations. (And from their grades, it’s clear that by the end they could!) If I give them practice with questions of a similar format, I think that would defeat the purpose.

Obviously then, the problem is the teams, and it’s not clear to me what the solution is. Students suggested that I could have them do the work as a team but then call on individual students for the answers. That’s a little too invasive for my taste. One reason to have teams is to help less confident students — you know, the kind who would hate being called on.

I could imagine making the teams larger, maybe groups of 7-10. With more students, it’s more likely that some of them would show up. I could also make the teams smaller, maybe just 2 or 3 people per team. This would lead to less diffusion of responsibility. In either case, I’m sure there would still be slackers. Students don’t like having slackers on their team, but if everyone is getting a 100% on their team grades anyways, I don’t mind if there are a couple freeloaders. Maybe teaching this in person, if that ever happens, would change the dynamic and solve the whole problem.

If I were to teach this in a classroom rather than online, I would have them do more class activities, but have each activity be smaller/shorter. Sending people to breakout rooms on Zoom is a bit of a commitment. It takes a minute to send them out and to re-orient on coming back, so you want them to get their money’s worth. But teaching in person, it would be better to just give them more diverse tasks. Rather than giving them a 10-minute worksheet, I would do something like throw three histograms up the board and give them 3 minutes to tell me what values you could and could not reject from each.

3. Research Reports

About a year ago, I wrote an essay called What You Want from Tests, where I outline two kinds of knowledge that you need to have mastery over a skill. The first is the sort of things that every expert carries around inside their head, and this is what I argue you should try to examine with exams and quizzes. The other kind of knowledge is the ability to actually use the skill. Without the ability to use the skill, any knowledge is just trivia. You’re not an expert, you’re just a fan.

Statistics is a skill-based course, so the second kind of knowledge is really important. I didn’t just want my students to memorize a bunch of facts about statistics, I wanted them to learn how to actually use statistics.

A few years ago I was working with an undergraduate who had volunteered to be my research assistant. She was an exceptionally bright and curious student, who always asked remarkably insightful questions. She was also very diligent, and had already taken several stats classes before she started working with me. She had even taken some MA-level stats courses, which is unusual and impressive for an undergrad.

Despite all this, I discovered that she did not really understand stats. She had a hard time conducting even basic analyses. She didn’t understand many of the concepts. Despite her excellent grades, almost nothing from the classes had stuck with her.

I already knew that she was gifted, and I was aware of the shortcomings of the usual stats education approaches, so I reassured her that it was not her fault, and I offered to help her do something about it.

At this point I had already done a lot of thinking about how to do a better job teaching stats, and I realized that people always forget to teach this practical side of the skill, even though the practical side is what actually matters. Now, there’s no mystery about how to teach skills. I learned stats by struggling through real analyses for projects that were actually important to me, and everyone agrees that working on a project you genuinely care about is the best way to pick up a new skill.

But this doesn’t work in every situation. Even for me, it was a struggle, and this sink-or-swim approach is too harsh for the classroom. It’s also inefficient for beginners, because real data is messy and confusing. If students bring in a real problem, the correct approach might be too advanced for an intro class. And scale makes it impossible. Do we expect every student in an intro course to be able to bring in a project they’re thrilled about? They don’t know anything about the topic yet, so they don’t know what a good project would be.

I realized that all these problems could be fixed by using fake datasets. It’s easy enough to generate data, and you can make it look however you want. And unlike a real project, you can introduce concepts one at a time so that the student is always ready for them.

So that summer, I made a bunch of practice datasets for my RA to work with. I wrote a set of R functions that would automatically generate datasets to my specifications. At the start of each day, I would give my RA a short lesson on a stats concept, and then send her a couple datasets. Naturally, most of the datasets would be in some way related to that day’s lesson. She would work on them all morning, prepare some slides, and at noon, before we broke for lunch, she would give us a presentation on what she found out. I let my other RAs give feedback first (giving critique is great training as well), and then I would ask questions and give her feedback.

The first datasets were extremely simple, and they gave her no trouble at all. Once she was comfortable with conducting simple analyses on her own, I introduced complications, the sort of wrinkles one would expect to find in a real dataset. First I introduced the concept of statistical power, and gave her some critically underpowered studies, so she could learn to interpret those null results as inconclusive. Then we had a discussion of outliers, when and when not to exclude them, and the datasets for that day included different kinds of outliers. We covered causal inference, interactions, p-hacking, and many other concepts in the same way. The concepts in these lessons were cumulative. Once we had covered outliers, for example, I would sometimes put outliers in the datasets later on.

The datasets at the start of the semester were really easy. The datasets by the end were almost as tricky as real-world data. But at no point did my RA work on anything that was too hard for her. Each new complication was just one step up from something she had already mastered, so she was always prepared to tackle it.

3.1 Class Projects

I knew I wanted to do something similar for my class, to give them the same kind of practice with the practical side of things. In particular, I like this approach because for each dataset, you have to figure out what statistical test to run on the data. This is one of the stats skills you use most often in the real world, and it’s often the first question you ask when thinking about an analysis. Yet somehow, intro stats classes almost never teach this skill. At best, students get handed an extremely confusing flowchart. I knew I could do better.

Unfortunately the approach I used with my RA doesn’t exactly scale. I couldn’t give them the same kind of step-by-step training. I couldn’t have them all give a presentation on every dataset, and of course, many students are terrified of presenting to begin with.

Still, I figured I could come up with something that captured most of the benefits. I took several of the simpler datasets that I had made for my RA and I put them in a folder on the class website. Rather than having to analyze all of them, students were required to pick two of these datasets and write a research report about each of them. They could do these two reports at any point during the class, but since they weren’t taught how to do most analyses until about halfway through, I expected most of them to do these assignments during the second half of the course.

Students are taught to write long. This is a bad habit, especially when working with such simple datasets. I limited research reports to a maximum of one page long, including any graphs and/or tables. Students should learn to be concise, and besides, I didn’t want Liz to have to sift through dozens of extra pages when grading.

Each research report was 10% of the final grade, so these assignments were 20% of their grade in total. They were free to analyze the data however they wanted, but in particular we thought that R, SPSS, and Excel/Google Sheets were good choices, so I included one session for each of those approaches in the lectures. This wasn’t much training, to be sure. A lot of people might have seen this as a big risk — you’re expecting them to use R or SPSS with barely more than an hour of training each? But I wasn’t worried about it. Somehow I knew that they were up to the task.

Fig. 1: “Burgers Have Cheese???.png”, an example created in the course of instruction on the use of Google Sheets.

Originally, I was planning to let students do up to two additional research reports for extra credit. But in the week before class, one of the students suggested that instead of doing research reports for extra credit, we could let them re-do research reports that they weren’t satisfied with. This basically translated to “do 4 research reports, get your grade from the best two”.

I liked this for a couple of reasons. First, it let them make mistakes on early research reports without huge consequences, which was one of my design goals for the class. Second, students who were struggling would be encouraged to do additional reports, which would give them the extra practice they need, while students who didn’t need additional help wouldn’t be bothered.

I implemented this change, with the requirement that the do-overs would have to be on new datasets. Students would get feedback from Liz about how to do better, but they would have to apply those lessons in a new context. I limited them to two of these do-overs at most. I wanted them to be able to learn from their mistakes, but also I didn’t want each of them doing 10 reports.

The research reports were not really about the grades. They weren’t so much intended as evaluations. Really, they were more like practice, or lessons. What I really wanted them to get out of the research reports was, “I can do this and it’s not scary”, because I think it will help set them up to be confident when using these skills in real life (and on the Exams). It wasn’t about challenging or testing them, it was about giving them the opportunity to try things for themselves.

About halfway through the course, one student emailed me to ask for more guidance on how to format the reports. At the very least, she said, I should give them an example of what one would look like. I told her:

This assignment is designed to mimic what doing analysis is like in the real world. Data is emailed to you in a confusing format, and the file is poorly organized. The people who have hired you to conduct the analysis don’t know exactly what they want and can’t tell you what kind of test to conduct; after all, that’s what they hired you for. I’m trying to give you a controlled version of this experience — not nearly so confusing as real life, but where you are asked to exercise your judgment and the knowledge we’ve covered in class. Giving you any more guidance on how to conduct the analysis or write the report would defeat the purpose of the assignment.

To this student’s credit, she totally understood my point and ended up getting a 98 on both research reports.

A final reason to like the research reports is that they capture my “walk out of class once you’ve mastered the material” goal. If you already took stats but you were for some reason forced to take my class, or if you decide to teach yourself all the material in the first week, then you can just throw together two one-page reports, get an A+ on both of them, and forget about this part of the class entirely.

3.2 How did they do?

Students really surprised me on the research reports. When I first looked at the grades, I thought that maybe Liz had been too lenient. Almost all of them had gotten A’s! But when I looked closer, I saw that the students had earned them. The reports weren’t perfect, but they showed serious critical thinking and really creative engagement with the datasets. All very impressive for a subject they had been studying for less than six weeks!

When I looked back, I saw that on their first submissions, many students had gotten B’s and C’s. Liz wasn’t being too lenient at all. In fact, her feedback was intensely detailed! But this helped the students enormously. It’s clear that the students took that feedback and turned it around for their do-overs, and that’s what ended up earning them those A’s.

Some students, I was happy to see, didn’t need the do-overs. One student did her first two, got a 98 and a 99, and unsurprisingly, chose not to submit any more. Another student, who had said in class that she was terrible at math, gave it a shot and to her great surprise earned a 93 and a 90. She decided that was good enough for her, and didn’t send in another. The system works.

I especially liked how diverse the reports were. Students used all sorts of weird charts and phrased their results in all sorts of unusual ways. Not wrong per se, just the sort of thing an expert would never do. I think this demonstrates real understanding. Rather than just copying someone else’s approach, they had come up with their own, often slightly bizarre perspective, and then applied it. That’s what mastery looks like, folks.

How about the software? Some of them came to me or to Liz for help, but honestly, not as many as you might expect. For the most part they seem to have taught themselves.

When I was looking through the reports, I saw that most of them chose to use R for their research reports, and almost all of them did a solid job of it. This was a big surprise, but it’s very encouraging.

In conversations about how to teach stats, I’ve often heard, “It would be great if we could teach the students R or python. But you just can’t teach the average student a programming language in only one semester. It would take up too much of the lecture, and there would be too many questions for the TAs to handle. We should stick to SPSS worksheets and formulas for now, that’s the sort of thing that students can deal with.” I’m happy to have evidence that, in my opinion, proves this entirely false. Apparently students can learn the basics of R with almost no instruction, and in less than six weeks, as long as you give them the right environment for it.

I’m pretty happy with the research reports. Is there anything I would do differently next time? Well, one thing Liz pointed out to me is that while I gave them 24 different datasets, most of the reports were on the same 4 or 5 options. These were some of the most straightforward datasets, and most of them were analyses of correlation between two variables.

Now, as I said before, the research reports are not really about challenging students. I’m fine with them doing two easy reports, since doing any independent report at all is great for intro stats. But conducting correlation tests both times does slightly defeat the purpose of doing two reports.

A better system would be to break the research reports up into different bundles. Bundle A could be the easy ones and Bundle B could be more challenging. Bundle A could include one set of tests and Bundle B could include the others, so that every student would have to use at least two different tests. You could maybe include a Bundle C of advanced datasets. These could either give you extra points just for attempting them, or they could be strictly for extra credit. In any case, adding some more structure to the research reports would probably improve them.

Hindsight is Stats 2020, Part I: Fractal Course Design

This summer (2020) I taught Statistics for the Behavioral Sciences.

The course was unusual for a number of reasons. I’ve wanted to teach stats for a long time, so I came into this class with a collection of unorthodox ideas that I’ve been sitting on for a few years.

Things went really well. I had high expectations, but more than half (!) of my students got an A or higher. I didn’t shift my expectations, or make the class easier halfway through. These grades mean that most of the students either mastered the material to my satisfaction or came very close to doing so. This approach worked and I would definitely recommend it.

1. Course Format

1.1 Being Online

The big curveball for this class was the pandemic, which made it necessary to teach the class online. I’ve never taken a course online, and I had never expected to teach one that way. Going into this, I had almost no experience with online classes. When we transitioned to online instruction in March, I was TA’ing for a class, so I got to see how that went. But that was about it.

I’m confident in my skills, but there were a few things in particular that I was worried about.

One of the really rewarding parts of teaching is getting to know your students. But Zoom isn’t that great, so I was worried that there might be no personal connection. Partly I was worried that the class would be less enjoyable. People like making friends and knowing that the instructor cares about them. But part of it was also practical. Without that sense of the classroom and knowledge of the students, I was concerned that I wouldn’t be able to tell when students didn’t understand the material. Maybe I wouldn’t be able to explain things as well when they had questions.

The other major concern I had was cheating. I knew that in the transition towards online classes brought on by the pandemic, many schools forced students to install unsettling exam-monitoring software on their personal devices. This sort of thing is pretty evil. While I would never consider spying on my students, it did make me worry about cheating on exams. With online exams, it seems like it could be a real problem. But I also know from being a TA that students cheat a lot less than professors think they do. In the end I took no special steps to prevent cheating. I don’t really care about or believe in grades, and I decided to trust the students.

1.2 Personal Connection

It turns out that both of these concerns were unfounded.

Admittedly, there was very little personal connection. I didn’t get to know most of my students. I would recognize their names, but I never even saw most of their faces.

But no one seemed to suffer for it. In the end we still developed the rapport that you need for good teaching. In their evaluations, students said things like:

“Ethan was a great teacher! He clearly loved the subject, and wanted to try and teach it in a more accessible way”

“Ethan specifically explained things very well and was so real. It was nice hearing examples in ‘layman’s terms’ that were more approachable”

“I really felt as if this teacher wanted us to do well, and helped us learn as much as possible in the clearest way possible. … Great great teacher!”


“Ethan is cool”

This experience has changed my mind about classroom engagement, and makes me doubt some of the common wisdom about teaching.

Is getting to know your students a reasonable expectation? Certainly we can get to know our students. But is it appropriate? Students aren’t in your class to be your friends, and you’re not there to be their pal. People are in the classroom to, hopefully, learn something.

“Personal connection” often seems to be used as a proxy for respecting your students and treating them like human beings. But — surprise! — you can respect your students and treat them like human beings without necessarily having a friendship with them, or even knowing their names. Students are sensitive to this difference. They care about being treated with respect, but don’t seem to care about the other stuff.

A cynical take would be that professors use the excuse of “getting to know their students” to push students into having an unnecessarily friendly relationship. But pretending to be equals when you are in a position of power over someone is at best dishonest, and at worst is a way of denying that you have a responsibility to them.

I do think there are things you can do to drive engagement. But I don’t know if it really matters. My students got really good grades and displayed surprisingly deep understanding of the material, so it didn’t hurt their education. And many of them told me that this was one of the most enjoyable courses they have ever taken, so it didn’t seem to make learning any less fun.

1.3 Cheating

I was even more wrong about cheating. I didn’t see any evidence of cheating on exams or assignments, and there was plenty of evidence that they weren’t cheating. Students made lots of simple mistakes, which they could have avoided if they were cheating. Exam scores improved incrementally over time, just as you would expect from honest learning. Their assignments and answers on the tests were idiosyncratic, not the carbon copies you might expect if they were sharing answers. If students were cheating, they didn’t leave any trace of it, and so I’m inclined to believe that they didn’t.

The lack of cheating is a little weird. When I was a TA, I would catch students cheating all the time. They usually do a bad job of it — they forget that I was a student not too long ago, and so they don’t realize that I know most of the tricks. So the fact that we didn’t see any of the classic signs is strong evidence that there wasn’t any cheating.

So why didn’t they cheat on my class, when they do cheat during the semester? I think it has to do with trust. In the exit survey for the class, one student wrote down, “no feeling of being ‘cheated’ by the prof”. Another student wrote, “My biggest fears for this course revolved around completing it and not only doing poorly, but also learning nothing.”

Students tend to stoop to cheating when they think, often correctly, that there is no other way to do well in the course. When professors are unclear about expectations, or make examinations needlessly difficult, the students feel cheated by the professor, and will cheat themselves. When you see an exam filled with trick questions, it’s hard not to feel like the game is rigged. But to their credit, even in this situation, most students still won’t cheat.

Teachers have a lot to learn about cheating. If you don’t cheat your students, most of them won’t cheat on your assignments. It’s about trust. Not your trusting that they won’t cheat on assignments — their trusting that you won’t cheat them in their education.

This all makes it especially disappointing that, during this pandemic, so many schools are engaging in unethical surveillance of their students in the name of academic honesty. Students just don’t cheat all that much, even when they definitely could get away with it.

2. Course Content

So much for the course format. What was I actually teaching?

2.1 What’s Wrong with Stats?

Statistics education is pretty terrible, and everyone knows it. All the professors who teach stats agree: students come into class, usually manage to pass, and retain almost nothing.

Everyone is looking for the magic bullet. But even so, no one thinks it’s a great mystery. Professors and TAs will all tell you the same thing: the problem is motivational. The majority of students, they say, simply aren’t interested in learning this esoteric form of math. As a result, most of the proposed solutions are motivational as well: find a way to make it fun and interesting, or at least find the right set of rewards and punishments.

But when I was a TA for intro stats, I noticed that this didn’t match what I saw at all. The students in my recitations were engaged, and really wanted to understand stats. They asked insightful and sophisticated questions, and were always pestering me for more detail. Yet somehow they seemed to come back every week having forgotten everything we discussed the week before. This isn’t the behavior of students who are checked out — this is the behavior of students who are trying, and repeatedly failing, to build a model of what is going on around them.

Even if I had been wrong about most students, there were a few of them who were clearly both able and motivated. These students got perfect scores on multiple tests and assignments, regularly came to my office hours, and discussed many of the concepts in great detail. They showed me the extensive, meticulous notes they had taken in lecture. But when it came to answering simple questions about the material in a new context, they always came up blank.

These students weren’t lacking in motivation or intelligence. So it must be external; something about the class was failing them. Even if everyone in the class were as motivated as these high-achievers, we would still be having trouble with comprehension and retention.

2.2 Driver’s Ed

I think the motivation story is all wrong. The problem is that the subject is taught at the wrong level.

Imagine you are taking a driver’s ed course, and have just shown up to the first day of class. The professor gets up and says, “Hi everyone, in this class you’re going to learn all about cars. Cars are really amazing. Some people use cars to get to work. Some people use them to get to school. Some people use them to go on vacation! There are a lot of kinds of cars. The big ones are called trucks. Those ones carry things like fruit and gravel. In this course you’ll learn all the different kinds and their uses, and we’ll talk a bit about the history of cars.”

You raise your hand, “Excuse me, professor. I’m here because I want to learn how to drive. I didn’t come here to learn about the types or history of automobiles. I’m sure that knowledge will come in handy in some ways, but it’s really not my focus. How do you actually drive?”

“Worry not,” he says, “To drive, move the wheel back and forth.”

So you leave that course and you sign up for a different one. You show up to the new class, and the professor gets up and says, “Hi everyone, in this class we’re going to learn all about cars. We’re going to be starting with the drivetrain. It’s important that you be able to describe and identify all the parts. Look at this diagram. Here’s the gearbox (which you can see is constant-mesh), clutch mechanism, the flywheel, the differential…” You get up and walk out of the room.

Neither of these classes will teach you how to drive. And sadly, this is a pretty good metaphor for how statistics is usually taught. Some statistics courses give students an overview of probability theory and a brief sense of the history, without teaching them how to actually conduct an analysis. Others throw the equations right on the board and start discussing the terms without any context. All too often, a single class will try to include both of these approaches. This is probably worse than either of them alone.

Students don’t want to learn a list of tests, the life history of Ronald Fisher, or the exact meanings of the terms in the formula for the pooled standard deviation. All these are things one naturally picks up over time, but none of it is useful without the core knowledge. Students want to learn what statistics is and how we actually use it. But somehow they seem to come away from our courses without having been taught either of these things.

Driver’s ed focuses on the point of contact: how to use the car. Similarly, the main goal of this class was statistical skills and how to use them.

I wanted students to become statistically literate. Most students won’t end up being researchers or statisticians in the same way that most people who take driver’s ed won’t end up being auto mechanics or engineers for GM. We still benefit from knowing what a car is and how to operate it. Similarly, students benefit from knowing what statistics is and how to use it. For those students who do want to go on to use statistics professionally, this will still give them a strong foundation. Auto mechanics don’t suffer from having taken driver’s ed in high school.

The focus was limited and practical. Students were taught how to recognize different kinds of variables and data, interpret standard plots and graphs, read and understand statistical reports, and conduct basic analyses using statistical software. I alluded to other subjects of interest in lectures, but in the lessons and the evaluations, I focused on these basic skills.

We can also talk a little bit about what I didn’t want to cover. The history of stats is interesting, but most of the time it doesn’t help you be a better statistician. The most important thing to know about the history is that these tests and concepts were just invented by a few guys not all that different from you and me. Anyone can make up a concept or design a new test. You assign it a Greek letter and suddenly it sounds official, but for all we know, Fisher came up with it while sitting in the bathtub. Besides that, most of the details don’t matter. Aside from a couple of helpful examples, I didn’t teach them anything about the history of statistics.

You do need to know a few symbols to be able to interpret tests, but I didn’t want to cover much in the way of formatting. I don’t care if students report a number as 0.02 or .02 or 0.0212; I don’t care if they write “p-value” or “p-value”. Time is limited, and I don’t want to waste their time or my time going over this nonsense. If by the end of the class, they know the concepts but not the formatting, then I have succeeded. If they know the formatting but not the concepts, I have definitely failed. So I decided to focus on the concepts and, as much as possible, ignore the formatting.

2.3 Fractal

So that’s what I wanted to teach. How do you actually teach something like this?

Most courses take a cumulative approach. You start with the basics, and the material slowly becomes more and more complex. Each lesson builds on all the previous lessons. At the end you finally tackle the most advanced material. Then you take the final.

In my experience, this falls apart by the second week of class. Students who miss even a single lecture are cut adrift, left to founder or drown. Even if you make it to every class, your safety isn’t guaranteed. If you don’t understand the explanation they give in lecture, you’re out of luck, because the class is never going to come back to that topic again.

Rather than being cumulative, my course approach was fractal. A fractal is a figure or function where every part has the same character as the whole. Every part contains copies of the whole thing. That’s how I structured the course: every part of the course was nested within other parts of the course.

A photo of my stats course from space. jk it’s fractal broccoli

You could be the best teacher who ever lived, with the most beautiful slides imaginable. It doesn’t matter — students just can’t learn something in one go. This is especially true in statistics. The classic learning pattern for the subject is brief flashes of insight, a feeling of sudden understanding, and then losing your hold on it and slipping back into confusion. This is normal.

For some reason, people don’t understand this. Everyone thinks there is going to be a shortcut explanation for these ideas, but we don’t think that way about other skills. We don’t think that painters will master three-point perspective in a single session, and we don’t expect programming students to master for loops in a single day. Maybe you can get the gist after the first introduction, but really understanding these topics takes time. Somehow we see stats differently. In particular, there is a whole genre of articles and blog posts all about how to explain p-values. These assume that the concept can be distilled into a single statement, or a single lesson. But that’s crazy. You can’t understand p-values in one hour, no matter how good the explanation is.

I think of statistics as really being three closely-related topics: a language for talking about data in general, descriptive statistics for talking about individual variables, and inferential statistics for making educated guesses about the world on the basis of limited samples.

The structure was built around these topics. The first day of class was an overview of the entire course, introducing all three topics in very general terms. Day 2 and Day 3 were another microcosm: again we covered the whole course, this time in slightly more detail.

Week 2 covered data in more detail. Week 3 covered descriptive statistics. Weeks 4 and 5 covered inferential statistics. Finally, in week 6, we went even deeper into inferential statistics, exposing exactly how the math behind the tests works.

This means that students see every single topic many times before the end of the course. For example, the two-sample t-test appears a total of six times in the lectures. It appears first in day one, during the complete overview, again in the lectures for day three, and then again in weeks three, four, and six.

It doesn’t matter if you don’t understand the two-sample t-test the first time, or the second time, or even the third time you see it. It doesn’t matter if you miss a few classes. It doesn’t matter if one of the examples I use doesn’t make sense to you. We will come back to this concept again, in a new context, with new examples. By the end of the class, you will get to see it from every angle.

These things take time. Mastery of a subject comes only when you return to an idea over and over, seeing it in new situations and becoming more familiar with it, building your own understanding. The structure of the class needs to support this, or students won’t be able to learn a damn thing.

2.4 Context

My influences in this were the Snowflake Method, and Progressive Rendering from It’s Time For An Intuition-First Calculus Course. Both of these perspectives emphasize understanding the gist of an idea before getting stuck in the details. To quote the reasoning from It’s Time For An Intuition-First Calculus Course:

The “start-to-finish” approach seems official. Orderly. Rigorous. And it doesn’t work.

What, exactly, do you know when you’ve seen the first 20% of a portrait in full resolution? A forehead? Do you even know the gender? The age? The teacher has forgotten that you’ve never seen the full picture and likely can’t appreciate that you’re even seeing a forehead!

Progressive rendering (blurry-to-sharp) gives a full overview, a rough approximation of what the expert sees, and gets you curious about more. After the overview, we start filling in the details. And because you have an idea of where you’re going, you’re excited to learn. What’s better: “Let’s download the next 10% of the forehead”, or “Let’s sharpen the picture”?

Let’s admit it: we forget the details of most classes. If we’ll have a hazy memory anyway, shouldn’t it be of the entire picture? That has the best shot of enticing us to sharpen the details later on.

Sometimes I think of this course as Intuition-First Statistics. “Intuition-first” doesn’t mean our goal is to teach good statistical intuitions, though hopefully students do get some of that. It means that we should start by working with intuitions, and that everything else will follow from that. Because, although it may sound surprising, students actually have pretty strong statistical intuitions.

The problem is context. The cumulative or start-to-finish approach makes perfect sense to the instructor, but only because they already know what is coming. They can see the context; how everything is connected.

The students don’t have any of that. They just get hit in the face with new material that they never saw coming. Every day it’s some new bullshit. They have no idea what is up next, what it means, or how it all is related. They’re always being knocked off-balance by new topics you didn’t prepare them for, and they never have time to figure out how it’s all connected.

Your Students
Your Students

This is a huge problem, because context really matters for comprehension and memory. A great example comes from research by Bransford & Johnson (1972). In their studies, participants heard a paragraph like the one below. Take a look at this passage and see if you can figure out what it is all about:

The procedure is actually quite simple. First you arrange things into different groups. Of course, one pile may be sufficient depending on how much there is to do. If you have to go somewhere else due to lack of facilities that is the next step, otherwise you are pretty well set. It is important not to overdo things. That is, it is better to do too few things at once than too many. In the short run this may not seem important but complications can easily arise. A mistake can be expensive as well. At first the whole procedure will seem complicated. Soon, however, it will become just another facet of life. It is difficult to foresee any end to the necessity for this task in the immediate future, but then one never can tell. After the procedure is completed one arranges the materials into different groups again. Then they can be put into their appropriate places. Eventually they will be used once more and the whole cycle will then have to be repeated. However, that is part of life.

One third of the participants heard the paragraph without any context. It didn’t make much sense to them, and they had trouble recalling what they had heard.

The next third of the participants, before hearing the paragraph, were told that it was about doing laundry. To these participants, the paragraph made perfect sense, and they had very little trouble recalling the details.

The final third learned the topic only after they’d heard the entire paragraph. These participants also found the paragraph confusing, and even having been given the context, weren’t able to recall much about it. Context alone isn’t enough; you need to see the context up front.

Something similar happens in class. Without context, even the most motivated students have trouble remembering the material. They have a hard time memorizing tests or equations because they don’t understand what a test is used for, let alone how it works. I don’t have trouble with the equations, but only because I understand what the tests were created to do. It’s easy to put things into their proper categories them if you have a good grasp of the system; it’s impossible if you don’t even know what categories there are.

The fractal approach solves this problem. The first two or three times I went over the material, I didn’t expect them to remember any of it. We cover all the material early on, because being introduced to everything at a shallow level prepares students to understand the material in depth once it comes back around again.

What You Want from Tests

[I originally wrote this around December 2019, and I’m reposting it here for reference.]

I used to be in favor of open-notes tests. But after seeing them in action for a while, I realized that I don’t think that they’re a very good idea.

It’s true that traditional tests don’t do a good job accomplishing what they are designed for. It’s good to see people exploring different ideas about what tests can be. But an open-notes approach doesn’t fit very well with the strengths of test taking.

Settling for this approach keeps tests from becoming all they can be. Tests have some natural strengths and some obvious weaknesses, but if we understand this, we can design tests that will help us do what we want. I say this as a person who went to a college that had no tests at all!


The traditional argument in favor of open-notes tests is that having access to your notes is more true to life. In the real world, the argument goes, you aren’t locked in a room with no resources and forced to answer questions under a time limit. You have access to whatever resources you need, and can look things up as you go.

Einstein famously was unable to remember the speed of sound, when given the Edison Test. Why memorize such facts, he remarked, when one could easily look them up in a textbook?

This perspective is entirely correct. Skill involves the use of more than just what one carries around in one’s head. An expert makes use of many tools and will refer to a variety of sources when solving a problem. In many ways, skill in a domain is skill at using the reference works of that domain. Hence the old joke that programming be renamed “Googling StackOverflow.”

Take this view too far, however, and you end up with absurdity. It’s clear that experts don’t carry around everything in their head. But it’s also not true that they carry around nothing in their head.

A physicist may not be able to tell you the speed of sound without looking it up. But every physicist will be able to tell you who Maxwell and Newton were, and a little bit about their contributions. If someone doesn’t know what F = ma means, they’re probably not a physicist.

A programmer won’t be able to recall from memory the exact workings of every function they’ve ever used. But every programmer will be able to tell you the syntax for writing a for loop in their favorite languages. If someone can’t tell you the syntax of an if statement, they’re probably not a programmer.

An expert is someone who is able to do both. Some things they will know by heart, and some things they will be able to accomplish only given time and resources. You need both to have mastery of a skill. We might call these two forms of knowledge what you carry around in your head and what you can accomplish.


We don’t expect students to leave a class as an expert in their field, but we do expect them to have mastery of the material.

What does mastery mean? I think that mastery involves both of these skills.

Someone who can accomplish a task but doesn’t carry any of that knowledge around with them is following a guide, or a set of instructions, without any understanding. Someone who can tell you important facts about a field but can’t accomplish anything is a fan, not an expert.

Students shouldn’t be expected to memorize everything. We should understand that they will do their best work when they can use their notes, look things up, and take time to consider multiple angles on a problem. But we should expect them to carry certain very important facts around in their head wherever they go.

I don’t care if a student leaves my statistics class without memorizing the equation for a t-test. But if they can’t explain what a p-value is, or can’t read a scatterplot, that’s a problem.

If we want to evaluate a student’s mastery of a class then, we want to measure both of these kinds of knowledge. We should give them the chance to demonstrate real skill in the field, but we should also require them to show that they have internalized some of the most important facts and concepts.

Luckily we already have good ways of doing both.

Tests isolate the student from their resources and have the potential to measure the information that the student actually carries around in their head.

Class projects and papers allow students to use whatever they want in the solving of an actual (if usually artificial) problem, and have the potential to measure the student’s ability to accomplish practical work in the field.

If tests and projects are designed with this in mind, the class can run smoothly. If they are not, the result is disaster.


What are the important features of a test? Well, they happen in a controlled environment. You can’t choose what you’re working on; the questions have been decided for you. You have a limited amount of time. You’re not allowed to collaborate with other people. And you’re not allowed to look anything up.

Open-notes tests relax this last criterion. Some of them relax it in a small way; often students are given a formula sheet, or are allowed to bring a page or a note card as a cheat sheet. Sometimes these tests are truly open notes, and students are allowed to refer to whatever they like. Sometimes students can even bring their laptops, and make use of the entire internet. [1]

Trying to evalate a student’s skill at solving problems without restrictions is good. Trying to do it with a test is bad.

Tests aren’t a good way to evaluate this kind of knowledge because they still unnaturally restrict the student in other ways. The student isn’t given the kind of time they would have if they were solving a real problem. They don’t get any choice of what problem to work on. They can’t collaborate with others, or go to peers to discuss some aspect of the problem that’s troubling them, something that is a huge part of solving problems in the real world. The format of a test hamstrings them.

This is especially tragic because tests are so naturally suited to evaluating the knowledge and skills that a student has internalized. Why not use the tests to see if the things you want them to carry around in their heads have actually ended up there?

When designing a test like this, you should figure out what you want your students to walk away with, and only include questions about those facts and skills. Anything that they would be better off just looking up (dates, exact values, trivia, etc.) shouldn’t appear on the test in the first place.

A simple way to evaluate this kind of test is to give it to your peers and to other experts, and make sure that they can answer all the questions easily without looking up the answers. If experts in the field can’t casually ace your test, then it isn’t a good test of what experts should be expected to carry around in their heads.

This standard may even be slightly too harsh; you probably don’t need your students to walk out of the class on the same level as an expert. Another way to figure out if a test like this is fair is to pick a student who you know reasonably well and seems to have mastered the subject, and see how they do on your test.

A test made on these principles should be simple and easy, something that an expert would be able to breeze through.

Projects & Papers

Depending on the subject, class projects or papers are the right way to test the other skill. Rather than shoehorning open notes into a test format, which doesn’t suit it, just have them do a project.

Projects are inherently open-notes; who ever heard of limiting the resources that can be brought to bear on a class project?

No course can really be like the real world, but giving students a facsimilie is a good idea. Projects provide a better environment for this because they don’t hamper the student unrealistically, as even the most liberal open-notes test will. Students have some level of control over what project they choose, how they approach it, what techniques they use, and who they call on for help.

Is this true for all classes? I don’t think so. Foreign language courses are all about internalization. If you need to look anything up, you haven’t really learned the language. Testing makes a lot of sense in a language course, but I’m not sure if there’s any place for projects, at least not at introductory levels. Once you get to a composition course in a foreign language, projects start making more sense again.

There may be other reasons to include projects in one of these courses. In this essay I’m talking about projects being used as a form of evaluation, but projects can be an important teaching tool as well. Having students complete a project as an alternative to readings or lecture is a good idea, but a different use case.

There are also probably some subjects where tests make no sense at all. For many hands-on skills, like writing or sculpture, you could conceivably make a test, but the real proof will be in creation.

Testing is a good way to examine internalized knowledge, but there are some kinds of internalized knowledge that aren’t easily measured by a test. Just how to hold your hammer and chisel, just what the dough looks like when it’s ready — these are things that an expert will have internalized, but which would be difficult to put on a test. So there are some kinds of internalized knowledge that are better measured by projects.

It seems like this is especially true for crafts, and for courses beyond the beginner level, as the student begins to pick up these hard-to-measure intutions.

Generally, the more advanced the course, the less of a role there is for testing. While every subject has a core base of knowledge that all experts will know by heart, specialists will internalize knowledge that sets them apart even from other specialists. People already seem to understand this at some level, and most advanced courses go light on the tests.


[1] Sky Zhang points out that in certain cases, formula sheets can make a lot of sense. A programmer may not remember the syntax for all the basic operations of the language they’re learning, and the professor shouldn’t care. Giving them a sheet that provides that syntax won’t help them if they don’t understand the concepts, but it is forgiving towards students who have deep conceptual understanding but can’t be bothered to remember the exact notation for every operation. We can trust that if they choose to continue, they will eventually know the basics by heart. I think this is another case where professors should think about what they really want students to get out of the course (in this case, the concepts) and what they could care less about (hopefully, the syntax).

Thanks to Amy Ludwin and Sky Zhang for reading drafts of this.