Regression – Residuals – Why?

 

Suppose that you want to build a square deck for your backyard.  A builder has given you the following estimates:

Width (in feet)

4

6

8

10

12

14

16

18

Cost (in dollars)

150

400

600

900

1400

2000

2500

3000

 

1.      Use technology to construct a scatter plot of these data.  Comment on the appropriateness of linear regression based on your scatter plot.

 

 

 

2.      Use technology to determine the regression line for predicting the cost of the deck from the width of the deck.  Add the regression line to your scatter plot.  Comment on the fit of your regression line to your data.

 

 

 

3.      What is the correlation coefficient for the relationship between cost and width of the deck?  Based on your correlation coefficient and the appearance of the fit of your regression line with the data, does it appear that linear regression is appropriate for this data set?  Why or why not?

 

 

 

4.      Suppose that you want to build a square deck with width 10.5 feet.  What is the estimated cost for the deck?  Does the cost you estimated seem reasonable with respect to the data?    About how much will your prediction be off?

 

 

 

5.      Use technology to construct a scatter plot of the residuals versus the width of the deck.  What does a residual plot tell us about our linear regression?


Answer Key

Regression – Residuals – Why?

 

 

Note to instructors: Here I have provided the answers that I think students will provide.  Some students may be more (or less) concerned about the slight curve to the data than the answers indicate.

 

1.      Use technology to construct a scatter plot of these data.  Comment on the appropriateness of linear regression based on your scatter plot.

 

The following is a scatter plot of the data:

 

 

The data look fairly linear, although there might be a slight curve in the middle.  Overall, it looks like linear regression is appropriate for these data.

 

2.      Use technology to determine the regression line for predicting the cost of the deck from the width of the deck.  Add the regression line to your scatter plot.  Comment on the fit of your regression line to your data.

 


The following is the regression analysis using Minitab:

Regression Analysis

 

The regression equation is

Cost = - 933 + 209 Width

 

Predictor       Coef       Stdev    t-ratio        p

Constant      -932.7       176.2      -5.29    0.002

Width         209.23       14.79      14.15    0.000

 

s = 191.7       R-sq = 97.1%     R-sq(adj) = 96.6%

 

Analysis of Variance

 

SOURCE       DF          SS          MS         F        p

Regression    1     7354300     7354300    200.22    0.000

Error         6      220387       36731

Total         7     7574688

 

The regression equation is: predicted cost = -933 + 209*width

 

The following is a scatter plot of the data with the linear regression line added in:

 

The regression line seems to fit the data fairly well, except for the area where there is a slight curve in the data.

 

3.      What is the correlation coefficient for the relationship between cost and width of the deck?  Based on your correlation coefficient and the appearance of the fit of your regression line with the data, does it appear that linear regression is appropriate for this data set?  Why or why not?

 

The correlation coefficient for cost and width of the deck is 0.985.  This correlation coefficient indicates a very strong positive linear relationship between cost and width of the deck.  Based on the correlation coefficient and the fit of the linear regression line, linear regression does seem appropriate for these data.

 

4.      Suppose that you want to build a square deck with width 10.5 feet.  What is the estimated cost for the deck?  Does the cost you estimated seem reasonable with respect to the data?  About how much will your prediction be off?

 

The estimated cost for the deck would be -933 + 209*10.5 = -933 + 2194.5 = $1261.50.  This cost seems reasonable based on the data since it falls between the $900 for a deck that is 10 feet in width and the $1400 for a deck that is 12 feet in width.  Our prediction error for our estimated cost is $191.70.

 

5.      Use technology to construct a scatter plot of the residuals versus the width of the deck.  What does a residual plot tell us about our linear regression?

 

The following are residual plots from Minitab:

 

Notice that the slight curve in the data is magnified in the residual plots above.  Based on these diagnostic measures, linear regression might not be appropriate for these data.