P2-07: Show Me How to Simulate It (from Scratch!): A Project to Compare Methods for Calculating the Confidence Interval for a Population Proportion

By Maureen Petkewich and Brian Habing (University of South Carolina)


What is the best way to introduce an alternative statistical method which is an improved version of a traditional method?  For example, can one effectively explain that the traditional Wald method for inference on the population proportion does not provide sufficient coverage for small samples and therefore we use the “Adjusted-Wald” or “Agresti-Coull’ method?  Instead of telling students when one method outperforms the other, students in a STAT 515 Statistical Methods I course at the University of South Carolina performed simulations in R to investigate the performance of the two methods.

An activity was designed based on the work of Agresti and Coull (1998). Agresti and Coull (1998) provided results of a simulation study showing the probability coverage of the Wald versus the Adjusted-Wald confidence interval when p = .10 with both n = 5 and n = 10. Students were assigned to read the article and then divided into groups where each group was assigned to explore other specified values of p and n. They were provided with an R code template to construct a data matrix representing 10,000 samples of size n, the calculation of confidence intervals for each of the samples, and the coverage probability based the percent of confidence intervals that captured p. The R code walks students through observing the sampling distribution of the sample proportion for the n and p value, a few of the simulated samples (the raw data), sample proportions, and confidence intervals before finding the coverage probability for all 10,000 samples. Students were directed to produce a plot showing the coverage probability for both the Wald and Adjusted-Wald interval for the various n (or p) values. Finally, the group made a recommendation for which method to use for the various values of n or p. By using R simulations to compare the Wald interval to the Adjusted-Wald interval, the project is in alignment with recommendations from the 2016 GAISE report (ASA, 2016) to “foster active learning” and “use technology to explore concepts”.

Note that this project was motivated by comments from the Fall 2016 semester student evaluations indicating that the students wanted to learn more R coding in class. The course is composed of advanced Honors level students taking a 500 level statistics course. It is the first statistics course for Statistics majors and other quantitative science majors. R programming language is used in the course to prepare students for future STAT courses and because it is free, widely accessible, and widely used both within and outside of academia. The project goes beyond a ready-made applet for probability coverage by allowing students to engage in more complex and creative raw coding which may increase their confidence and ability to use R as well as inspire them to continue creative investigations on their own. For instance, students can see how to easily control and manipulate the code if they wanted to investigate additional methods for computing a confidence interval for a population proportion; an option that is not doable with a pre-programmed applet.

The poster presentation will include background on motivation for the project, selected group results, handouts with directions for the project, copies of the R code template, and a demonstration of usage of the R code. The code would be especially useful for instructors who want to use R in their courses but are still relatively new to R coding themselves.

In order to assess the effectiveness of the project, a survey will be administered to students in the course upon completing of the project. The survey will allow students to report on their perception of the how the project helped them understand the concept of using simulations to investigate statistical methods as well as assess the impact of the project on their confidence and ability with R. Results of the survey will be shared along with comments requesting more R coding from the Fall 2016 semester.