P2-17: Pigskin Probability

By Carl Miller (Northern Kentucky University)


Play by play data from college football games provide students with a rich data set, consisting of both quantitative and categorical variables, ripe for investigation. Conference attendees will see one directed activity. Other examples will be made available upon request.

Attendees are invited to go through part of an activity where outcomes of plays are examined. This begins with the question, “What can happen on a given play in a college football game?”  Students are provided access to the data (or a small portion), and discussion ensues about what outcomes are possible and which summary methods are best. A rich discussion of how to handle incompletions on pass plays, quarterback sacks, or even rushes where field position is lost can create quite a debate. Students further discuss how graphical displays may assist in telling the story of the data not simply with formulas or words, but also visually as suggested by R. W. Pike in his Creative Training Techniques Handbook (1994). Another approach is to assign quarters to groups and have them summarize what happens in their particular part of football games. Probability or Statistical methods may be applied, using visual displays to show what the methods are attempting to quantify. With quantitative data, a comparison of the typical yardage gained on a play can be compared between rushing and completed passing plays. A t-inference may be completed, while side-by-side boxplots visually show what the inference is attempting to formalize. 

As the data set is expanded to include more games (currently it has approximately 60 games with over 9,000 plays), methods of data science and big data will be needed. With large samples, statistical significance will occur but practical significance may be questionable. As a result, graphical methods may be the better choice to tell the story of pigskin play.