 align=middle width=80% Student's version HTML Format Word Format

Regression - Residuals - Why?

Jacqueline B. Miller
Department of Mathematics and Computer Science
Drury University
900 North Benton Avenue
Springfield, MO 65802

Statistics Teaching and Resource Library, July 26, 2001

© 2001 by Jacqueline B. Miller, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

As teachers of statistics, we know that residual plots and other diagnostics are important to deciding whether or not linear regression is appropriate for a set of data. Despite talking with our students about this, many students might believe that if the correlation coefficient is strong enough, these diagnostic checks are not important. The data set included in this activity was created to lure students into a situation that looks on the surface to be appropriate for the use of linear regression but is instead based (loosely) on a quadratic function.

Key words: regression; residuals

## Objective

To present students with a situation that appears to be suitable for linear regression and challenge them to balance the issue of strong correlation versus diagnostics (here, a residual plot) that indicate otherwise.

## The Activity

Prior to assigning this activity, students should have had an introduction to scatter plots, linear regression, and residual plots. The activity involves a set of data dealing with the width and cost of a square deck you want to build for your house. Students are asked to construct a scatter plot of the data, to find the linear regression equation to estimate deck cost based on the width of the deck, to find the correlation coefficient, and to comment on the appropriateness of linear regression for the data set. Students are asked to make predictions for the cost of building a deck for a width that is similar to the widths in the data set. Students are then asked to examine a residual plot to determine the appropriateness of linear regression of the data set. The scatter plot and correlation coefficient indicate that linear regression is appropriate, while the residual plot indicates that linear regression is not appropriate, for this data set. Following student completion of the activity, the instructor should engage the students in a discussion about the appropriateness of linear regression in situations like the one posed in this activity. Suggested questions for discussion are included in the assessment section that follows.

Assessment

To me, the assessment centers on the in-class discussion question. In many of our classes, we discuss checking diagnostics for the appropriateness of linear regression. What should we do in a case where it looks like linear regression is appropriate, strong correlation and all, until we examine the residual plot? Getting the students involved in a rich discussion about the appropriateness of linear regression in such a situation is important. Such a discussion will give the students the opportunity to address and deal with issues not addressed in standard questions about linear regression. Questions for discussion might include: Based only on the scatter plot and correlation coefficient, does linear regression appear to be appropriate for this data set? Based only on the residual plot, does linear regression appear to be appropriate for this data set? Although the residual plot magnifies a quadratic pattern that exists in the data, the correlation coefficient is 0.985. Can we ignore the findings of the residual plot because there is such a strong relationship between cost and width of the deck? Why or why not? What other diagnostic methods might we use to determine the appropriateness of linear regression in this situation? Based on this activity, can we establish some general rules to determine the appropriateness of linear regression in a variety of situations

Assessment of the students would be done informally during the discussion, paying particular attention to student involvement in the discussion and to nonverbal involvement in the discussion.

Formal assessment might involve an exam question or two that challenge the students to return to the issue of the appropriateness of linear regression. Consider, for example, the following questions:

• True or False: Regression is always appropriate when the points in the scatter plot appear to be linear and the correlation coefficient is strong.
• True or False: Whenever the residual plot suggests that there is a pattern in the data, we cannot perform linear regression on the data set
• Instead of the questions above (or similar objective questions), the instructor could write an investigative problem, similar to the "Regression – Residuals – Why?" activity, for a formal exam with a new data set that has another diagnostic problem (e.g., variation in spread, influential observation). By examining student responses to the questions in the investigative problem, the instructor would be able to assess how students integrate their knowledge of regression in a situation that challenges the students to think about the appropriateness of linear regression.

Teaching Notes This activity can be done in class or assigned as out-of-class work. Either way, I would suggest that students be allowed to work together on the assignment so that they might discuss the issues together. This activity is not dependent upon any particular piece of technology. Students could do the activity on a graphing calculator or with a software package. It is up to you as the teacher to determine whether you would like the students to use a particular piece of technology. This activity can clearly be expanded to using transformations on data, so that students can find a relationship between the data that might be more appropriate than the existing relationship.

Editor's note: Before 11-6-01, the "student's version" of an activity was called the "prototype".

 © 2000-2002 STAR Library