# Correlation/Regression

• ### Munroe on Correlation

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there"

Randall P. Munroe (1984 - )

• ### Sener on Extrapolation

Avoid Linear extrapolation ... The turkey's first 1000 days are a seemingly unending succession of gradually improving circumstances confirmed by daily experience. What happens on Day 1001? Thanksgiving.

John E. Sener (1954 - )

• ### Gould on Correlation and Causation

The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.

Stephen Jay Gould (1941 - 2002)

• ### Hoyle on Outliers

I don't see the logic of rejecting data just because they seem incredible.

Sir Fred Hoyle (1915 - 2001)

• ### I Use the Line

may sing to tune of "I Walk the Line" (Johnny Cash)

I keep a close watch on this scatterplot of mine
I keep applying regression all the time
I keep R-Squared to be the tie that binds
Because that's fine,
I use the line

I find it very, very easy to do least squares
I find another optimum and nobody cares
Yes, I'll admit I'm a fool for rms errors
Because that's fine,
I use the line

As sure as night is dark and day is light
I keep regression on my mind both day and night
And Gauss-Markov proves that it is right
Because that's fine,
I use the line

I check for patterns in my residual plots
I check again for any wayward dots
I could try splines, but I can't find the knots
Because that's fine,
I use the line

I keep close to y when I have this x of mine
I keep applying regression all the time
I keep R-Squared to be the tie that binds
Because that's fine,
I use the line

• ### Y hat dance

Lyric ©2005, 2006, 2009 by Lawrence Mark Lesser
may sing to tune of "Mexican Hat Dance" (traditional)

For (X, Y) data pairs, we call the Y's
The values observed. Now, let's fit a line!
For each X, the value of Y where on the line you would hit
Is known as a fitted value-- the value we say we predict.

And those fitted Y's always wear a hat:
A caret or circumflex are other names for that.
Subtracting the Y hat from Y is (vertical) error defined;
The sum of the squares of all these we want to minimize.

And that is all done by the line of best fit,
But first make sure you plot the points you'd like to fit!
And when you go plot all the scatter, do you see linear trend?
And does everything all look random for errors versus the fits?

• ### Natural Log

may sing to tune of "Hound Dog" (Jerry Leiber and Mike Stoller; popularized by Elvis Presley)

You ain't nothin' but a natural log,
Transformin' all the time.
You ain't nothin' but a natural log,
Transformin' all the time.
Well, if the data ain't skewed,
Then you ain't no friend of mine.

When they said you was for algebra class,
Well, that was just a lie.
When they said you was for algebra class,
Well, that was just a lie.
You don't work for less than zero,
And you ain't no friend of mine.

(repeat 4 times)

• ### Fitting the Line

may sing to the tune of "Draggin' the Line" (Tommy James and Bob King)

(Back-up vocals in parentheses)
Plotting the data, on X and Y,
Finding the slope, with most points nearby,
We want to find the angle, of the trend's incline,
Fitting the line (fitting the line),

Upward slopes make r positive,
Slopes trending down, make it negative,
From minus-one to plus-one, r can feel fine,
Fitting the line (fitting the line),
Fitting the line (fitting the line),

Points align, how will the data shine?
If you have upward slopes, it'll give you a plus sign,
Fitting the line (fitting the line),
Fitting the line (fitting the line),

How strongly will your variables relate?
Is there a trend, or just a zero flat state?
You want to know what your analysis will find,
Fitting the line (fitting the line),
Fitting the line (fitting the line),

Points align, how will the data shine?
Your r will be minus, if the slope declines,
Fitting the line (fitting the line),
Fitting the line (fitting the line),

(Guitar solo)

Points align, how will the data shine?
If you have upward slopes, it'll give you a plus sign,
Fitting the line (fitting the line),
Fitting the line (fitting the line)...

• ### Correlation Song

may sing to the tune of the English lullaby "Twinkle Twinkle Little Star" (Jane Taylor)

Are points near a line, or far?
What's the correlation, r?
If the fit supports a line,
Its slope and r would share the sign.
Twinkle, twinkle, you're a star:
Knowing stats will take you far!

• ### Rho Rho Rho

Lyric copyright by Lawrence Mark Lesser and Dennis K Pearl
may sing to the tune of the children's folk song "Row Row Row Your Boat"

Rho, rho, rho he wrote
aligning x and y
popu-lation corre-lation
how the points should lie