“It’s fun to sing about the Central Limit Theorem!”

So, I’ll start with the good stuff. I’m fairly proud of the lyrics that I’ve written for the Central Limit Theorem Song set to the tune: YMCA. We sing it in class, but I wasn’t able to show a recording of my students singing the song since I’d need to have parental release forms signed, oh well. I’m a bit pitchy, but I think I my head-bobbing is legendary.

I usually teach the song after I have go through the “YMCA example” with the students. The population distribution (which is totally made up) shows the ages of people going to a YMCA during the middle of the day (9 AM to 3PM) over the course of a week. There are very few children of school age. Most people are either babies/toddlers, 20-40 year-olds (presumably care-takers or people who don’t have a day job), and retirees (60-80 year-olds).

Ages of all of the attendees to the YMCA from 9:00 AM to 4:00 PM during the weekdays of a particular week.

Once we’ve looked at the population, we begin taking samples of size 2 being very clear on what each dot is and where it came from. Then we speed up the simulation to get 1000 samples of size 2.

Then, we do samples of size 5, and then samples of size 30. To get the respective approximate sampling distributions. It’s beautiful how it demonstrates how sampling variability is reduced when the sample size is increased and the sampling distribution also begins to look more normal as the sample size increases. We connect it to the formula for standard error (standard deviation of the sampling distribution) to see how close the simulation was to the true value. I particularly love unpacking why we get the peaks that we do in the sampling distribution for samples of size 2: either the people in the sample were both babies, both retirees, both middle-aged, or combinations of each.

Points of Emphasis

In teaching statistics (regular, dual credit or AP), I have found that the transition from descriptive statistics to inferential statistics is a huge hurdle without a clear understanding of sampling distributions. As a result, we spend a lot of time developing vocabulary, symbols and being clear about what the distributions we are looking at actually represent. I frequently ask questions like:

Q: “What is this dot?” A: “It’s the height of a single student”

Q: “What’s this dot?” A: “It’s the mean of the heights of a group of 10 students”

Q: “What’s this value and what does it mean?” A: “It’s the standard error and it represents the how far off my sample means will typically be from the true mean.”

The Beginning – Rethinking the German Tank Problem

I have always started the unit with the German Tank problem, but I’ve had to “prime” the students by playing a very similar, simpler game: “Guess how many numbers are in the bag” I have slips of paper from 1 to 33 in a paper bag. I shake the bag up and pull out 5 numbers. I tell the students that the numbers in the bag are consecutive numbers from 1 to something, and they have to guess how many numbers are in the bag. First, they use their gut. Let’s say I pull out 3, 8, 11, 24, 27; they might guess that the largest number in the bag is 30. I ask them why, and I get answers like:

“It’s unlikely that we would have pulled the highest number, so I added a little bit to the top.”

I ask: “How did you know how much to add?” They say things like: “Since there was a space of 3 between the lowest number and zero, I added that to the top as well”, and I clarify: “You added the minimum to the maximum” It’s through this process that we get a few decent formulas:

Max + Min

1.2 * Max (20% more than the max)

Mean + 2 Std Dev

We then use these formulas to crunch the numbers and make predictions. That way when they get into the German Tank scenario (which is essentially the same thing, just larger numbers) they have a clue of what it means to create their own predictive formulas.

My MacOS no longer supports Fathom, so I’m using CODAP, which has been awesome. I like it better for somethings and not as much for other things. Here’s a link to the CODAP version of the German Tank Problem.

Heights Labs – Developing the Idea of a Sampling Distribution

I have created several “labs” for the students to do to learn about Sampling Distributions. First, I have the students sample the heights of students from the class by pulling slips of paper from a paper bag. I think I got this idea from the Starnes/Tabor textbook, but I’ve created a little guide for the students to follow that highlights the key points.

I’ve also created a follow-up lab that get the the key idea about getting closer to the actual sampling distribution. Here’s a link to the CODAP lab.

I hope you find these resources helpful. Please let me know what you think!