As stated in the last post, Learning from Failures, I decided to adjust my approach to having students analyze and discuss data. We’d put a lot of time into working out many of the kinks, but it was really time to move on to scatterplot representations of data. My students already knew a lot about scatterplots and best-fit lines, so this allowed me to dive right in with some data.

Rather than stating a claim, I started with a statement and four questions:

I have the following measures (in cm) about 54 students: height, arm span, kneeling height, hand span, forearm length, and wrist circumference.

- Which pair(s) of variables do you think might show the strongest correlation? (And what would a strong correlation look like in a scatterplot?)
- Which pair(s) of variables do you think might show the weakest correlation? (And what would a weak correlation look like in a scatterplot?)
- Which variable (from the list above) do you think would be the best predictor of a person’s height (in cm)?
- Write one claim statement about the class data variables.

These questions forced them to think about the data and make some predictions about what they might see once they were able to access it. We hadn’t really talked much about correlation, so I was really interested in their responses to what strong and weak correlations look like on a scatterplot.

Generally speaking, they said that strong correlations

- look like a line
- can almost see a line
- looks like a more defined line
- looks pretty linear

and weak correlations

- look like randomly placed dots
- have points that are far from the line
- looks more spread out and scattered
- has dots all over the place

As for question 3, there was quite a debate between whether arm span or kneeling height would be the best predictor of a student’s height. One side (6 students) argued that arm span would be the best predictor because “everyone knows that your arm span is about the same as your height.” The other two students claimed that kneeling height would be a better predictor because “it’s part of your height.” Both sides stuck to their convictions – neither could be swayed, not even by what I thought was the astute observation that kneeling height is probably about 3/4 of height. This prediction was made by a student in the arm span camp!

Students each received their own copy of the data and investigated their claims. During the next class, we took a look at a couple of those claims, together. The plot on the left is height vs arm span, with the line *y = x* (height = arm span). The plot on the right is height vs kneeling height, with the line *y* = (4/3)*x* (kneeling height = 3/4 height).

height vs arm span

height vs kneeling height

More debate ensued, though most admitted that kneeling height had a stronger correlation to height than arm span did (for this data, at least). And maybe the 3/4 wasn’t the best estimate, but it was pretty close. They also talked about those outliers, which led to a conversation about outliers and influential points.

### Moving from Class Data to Cars

I took a similar approach with the next data set.

I have some data about cars, including highway mpg (quantitative), curb weight (quantitative), and fuel type (categorical: gas, hybrid, electric). Think about how these variables might be related and make some predictions.

- How might the highway mpg and curb weight be related?
- how might the curb weight and fuel type be related?
- how might highway mpg and fuel type be related?
- Do you think there might be any outliers or influential points? If so, what might they be?

Through some class discussion, we came up with the following claims and predictions.

Students still had not seen the data and one of them said, “I really can’t wait to see what this looks like!” Another said, “Yeah, I’m not usually all that interested in cars, but I really want to know.”