"Visualization and Data Science with Big Data in a Multivariate Data Analysis Elective"
Amy Wagaman, Amherst College
The statistics curriculum is undergoing change as instructors search for ways to incorporate Big Data into their courses and provide students with skills in data science, for applications such as data visualization and communication, which are useful in their real-lives. One elective course where incorporation of Big Data and data science skills is a natural fit for the course is Multivariate Data Analysis. A number of colleges now offer a second or third course that covers multivariate data analysis topics, as evidenced by the JSM 2013 session on the topic. In this poster, we present an overview of a course on multivariate data analysis with a focus on infusing Big Data and data science skills into the classroom via examples and lab activities as well as through student projects.
My colleague Amy Wagaman (Amherst College) teaches this innovative multivariate methods class (Stat 330) early in the curriculum to get students thinking about, visualizing and working with bigger data sets. This helps communicate the excitement of statistics to a broader set of students. @askdrstats
Thanks for a nice presentation Amy! I've being teaching a similar course a couple of times: multivariate statistics with quite a lot of homework problems (that the students are encouraged to work on together, although they should hand in separate solutions) and presentations by the students. Your presentation has really sparked my interest in designing more problems related to visualization; I think that it well may be the most important part of multivariate data analysis, but I've often crammed it all into a single lecture and not had many assignments related to visualization (and perhaps focused to much on novelty visualizations such as Chernoff faces...). A project focusing on visualization sounds like a brilliant idea! One thing that I did try was asking the students to write a short essay describing trellis graphics (another useful tool for multivariate visualization). I did not discuss trellis graphics in the lectures, but instead asked the students to read up on it themselves, thus practicing learning new statistical techniques on their own. Have you tried anything similar or have you focused more on analyzing data?
You mention the importance of false discovery rates, and I agree that they have become increasingly important in recent years, with the growing use of multiple testing e.g. in genetics. In my course I've had a homework assignment were the students should read and compare some classic papers on multiple testing (Holm, Simes, Benjamini & Hochberg...), as some of these are very readable even for students in the "third" statistics course (especially if they've also taken some mathematics courses). The students have been pretty positive about this, and for some of them it's been their first contact with papers from scientific journals.
Måns, thanks for your comment.
Visualization has been one of my modules but I think the students need to spend more time with it. By the time they are working on their final projects, they are always more excited about trying the techniques they've learned about on the data than just looking at it. I think that's what we need to change.
I have not tried assigning short essays on types of graphics or anything, but that seems like a neat idea. It might be even more enlightening for the students if they were given the assignment in groups and had to present that type of graphic to the class. For a class with 4-6 groups that's probably manageable, and the students would be practicing communication skills too. You could have them submit an R Markdown file with their code so the other students could replicate it if they wanted those graphs themselves later.
Neat suggestion about the multiple testing papers for reading assignments. Thanks.
I like your idea about presentations of types of graphics. A fun twist might be to combine it with a data visualization problem: the students are given a type of graphics and are tasked to present the method and to apply it on a dataset which is common to all groups. With a suitable chosen dataset the students can then find different things in the data (some tecniques are useful for finding outliers, others for finding areas with lots of observations, and so on), which then illustrates the need of visualizing your data in more than one way. Thanks again for an interesting talk which I found very inspiring - can't wait to teach multivariate statistics again!