# eCOTS 2014 - Virtual Poster #4

### "Visualization and Data Science with Big Data in a Multivariate Data Analysis Elective" Amy Wagaman, Amherst College

#### Abstract

The statistics curriculum is undergoing change as instructors search for ways to incorporate Big Data into their courses and provide students with skills in data science, for applications such as data visualization and communication, which are useful in their real-lives. One elective course where incorporation of Big Data and data science skills is a natural fit for the course is Multivariate Data Analysis. A number of colleges now offer a second or third course that covers multivariate data analysis topics, as evidenced by the JSM 2013 session on the topic. In this poster, we present an overview of a course on multivariate data analysis with a focus on infusing Big Data and data science skills into the classroom via examples and lab activities as well as through student projects.

#### Recording

(Tip: click the fullscreen control)

(Tip: right-click and choose "Save As...")

Nicholas Horton:

My colleague Amy Wagaman (Amherst College) teaches this innovative multivariate methods class (Stat 330) early in the curriculum to get students thinking about, visualizing and working with bigger data sets. This helps communicate the excitement of statistics to a broader set of students. @askdrstats

Måns Thulin:

Thanks for a nice presentation Amy! I've being teaching a similar course a couple of times: multivariate statistics with quite a lot of homework problems (that the students are encouraged to work on together, although they should hand in separate solutions) and presentations by the students. Your presentation has really sparked my interest in designing more problems related to visualization; I think that it well may be the most important part of multivariate data analysis, but I've often crammed it all into a single lecture and not had many assignments related to visualization (and perhaps focused to much on novelty visualizations such as Chernoff faces...). A project focusing on visualization sounds like a brilliant idea! One thing that I did try was asking the students to write a short essay describing trellis graphics (another useful tool for multivariate visualization). I did not discuss trellis graphics in the lectures, but instead asked the students to read up on it themselves, thus practicing learning new statistical techniques on their own. Have you tried anything similar or have you focused more on analyzing data?

You mention the importance of false discovery rates, and I agree that they have become increasingly important in recent years, with the growing use of multiple testing e.g. in genetics. In my course I've had a homework assignment were the students should read and compare some classic papers on multiple testing (Holm, Simes, Benjamini & Hochberg...), as some of these are very readable even for students in the "third" statistics course (especially if they've also taken some mathematics courses). The students have been pretty positive about this, and for some of them it's been their first contact with papers from scientific journals.

Amy Wagaman: