### "Student Perspectives on Software Used in an Introductory Statistical Computing Course"

Chelsea Snyder & Julia L. Sharp, Clemson University

#### Abstract

Competence with statistical software packages is an asset to applicants currently seeking employment in applied statistics. Baglin and Da Costa (2013) discuss the connection between statistical literacy and competence with statistical software. With statistical computing skills, students can develop practical ingenuity to apply to real-world applications. Statistical computing is not only helpful for the purpose of job preparation, but also students are more engaged by working with real-life data sets. In a study conducted by Neumann, Hood, and Neumann (2013), 63% of students interviewed indicated that using real data in their statistics class gave real-life relevance to what they were learning in class. Accordingly, colleges and universities are modifying the curricula of their statistics courses to include greater use of technology. Statistical computing courses are taught in a variety of formats. Courses are taught at both the undergraduate and graduate levels. At some colleges and universities, statistical computing courses focus upon one software program, whereas at others, two or three statistical programs are introduced. The focus of these courses is either the programming aspect of the software, or the data analysis component, in which existing software functions and packages are utilized (Broman, Caffo, Irizarry, Peng, and Ruczinski, 2004; Christian, 2011; Gentle, 2013; Hofmann, 2013; Kim, 2004; Maboudou, 2011; Paciorek, 2011; Peng, 2014; Shalizi, 2013).

We examine an introductory statistical computing course offered in the Clemson University Mathematical Sciences department that exposes both graduate and undergraduate students to several statistical software programs and LaTeX. The course content has traditionally been focused on using SAS and R for importing data, data manipulation, basic descriptive statistics and graphical procedures, and inference for a single mean. Additionally, students learn to create a simple document in LaTeX comprised of sections, tables, and figures. We investigate whether the software programs focused upon in this course provide students with learning experiences that best prepare them for statistical software use in their jobs and other coursework. We implemented two surveys to gather data pertaining to our study goals. Prior to taking the course, students were surveyed to gain information about their software proficiency and interest, as well as computer science, database, and LaTeX exposure and experience. After taking the course, students were asked about their statistical software use and proficiency, software usefulness in their jobs, and recommendations about software packages to emphasize in future semesters. The pre-course survey was only given to students who took the course in 2011 and 2012, however, the post-course survey was sent to students who took the course since course inception in 2008.

Prior to taking the course, students indicated that they were comfortable with and used Microsoft Excel more frequently than other statistical computing programs. Graduate students indicated their use of R and Minitab more than undergraduate students. However, graduate students did not feel that their skills with R and Minitab were proficient. Overall, those surveyed indicated a desire to learn SAS and R in the course. After taking the course, students felt most proficient with Microsoft Excel and SAS. Moreover, students currently use Microsoft Excel most often among the statistical computing programs, but they indicated that learning SAS prepared them for its use in their current positions. Students recommended that Microsoft Excel, R, and SAS be taught in future semesters.

We conclude that students are best prepared for later coursework and jobs if SAS, R, and Microsoft Excel are the programs most emphasized by teachers of the course. Indeed, in 2009, 92 of the 100 largest companies worldwide used SAS software in some capacity (Lohr, 2009). Further, R is gaining popularity among statisticians (Vance, 2009). As a result, students would be well-served to learn both R and SAS as they prepare for a future in applied statistics.

#### Materials

- Download slides (PDF)

#### Recording

*(Tip: click the fullscreen control)*

Having trouble viewing? Try: Download (.mp4)

*(Tip: right-click and choose "Save As...")*

#### Comments

**Nicholas Horton:**

Snyder and Sharp describe an introductory statistical computing class designed to give students to skills needed to "compute with data". This is important to ensure that our students have the capacities needed to address small, medium, and big datasets. Their approach complements other models available at https://www.causeweb.org/ecots/ @askdrstats

**Julia Sharp:**

Thank you for the comment! It is important to give students confidence to work with different software to insure that they are able to use statistics for applications in their fields.

**Chelsea Snyder:**

Hello Nicholas, thank you for sharing your link. I tried to access the link but was not able to do so. Could you perhaps re-post the link, or make another suggestion about how to access the site?

**Homer White:**

Teaching at a small college with little access to commercial software, I'm impressed by the variety of software with whihc the students have experience. And by the way, I bet many faculty in client disciplines would find this course very useful!

**Julia Sharp:**

Thank you for the comment! I would be interested in considering the type of software that students from other sizes of universities have access to and are proficient with. Faculty and students in other disciplines have expressed interest in this course to enhance their computing knowledge, also.

**Tulia E Rivera-Florez:**

Today is unbelievable teaching Statistics without any software and real data. Actually it seems to be an engaging ways to teach basic concepts. Thus the challenge is for us, teachers, how to do it best? ....

**Julia Sharp:**

Thank you for your comment! The challenge for how to teach basic concepts, especially using software, does exist. I enjoy learning how others teach these concepts to improve my style each semester.

**Jim Robison-Cox:**

Hi Julia and Chelsea,

You mention that LaTeX is introduced in the course, so I wonder how in depth you go into it. Do you teach them how to get LaTeX output from SAS (I've not figured that out, but it seems to be an ODS option). And/or do you teach them to use knitr with LaTeX?

I teach a similar course which covers R and SAS, and have introduced them to markdown through knitr and Rstudio. It seemed to work well for them, and was easier than LaTeX.

Other points I wonder about: you divide the students into undergrad and grad. Are they all stat majors? I ask because we get students from other disciplines who want to learn stat computing (esp. R). Do you emphasize keeping records of the analysis and the whole "reproducible research" aspect of computing? It seems hard to do in Excel.

Nice presentation. Thanks,

Jim Robison-Cox

**Julia Sharp:**

Hi, Jim,

We do not teach obtaining output from SAS or R into LaTeX using ODS or knitr/Sweave. I have had graduate students present on Sweave for their final project for the class, but, we do not have time during the regular semester to teach these things. We use verbatim to include output and code directly in the LaTeX document.

The students in the course come from disciplines all over campus. However, the undergraduate students are mainly from the Mathematical Sciences. The graduate students have been from Applied Economics, Mechanical Engineering, etc.

We do keep records of the analyses and use scripts to write code in SAS and R. I agree that teaching that in Excel (unless we teach macros) and JMP, specifically, would be difficult.

Thanks for your comments!

Julia