David Diez, OpenIntro
The percentile bootstrap approach has made inroads to introductory statistics courses, sometimes with the incorrect declaration that it can be used without checking any conditions. Unfortunately, the percentile bootstrap performs worse than methods based on the t-distribution for small samples of numerical data. I would wager that the large majority of statisticians proselytize the opposite to be true, and I think this misplaced faith has created a small epidemic.
The percentile bootstrap is nothing new, but its weaknesses remain largely unknown in the community. I find myself wrestling with several considerations whenever I think about this topic.
A few years ago I created this spreadsheet to compare the percentile bootstrap to classical methods. For small samples, the t-confidence interval outperforms the percentile bootstrap through a sample size of 30 for numerical data. The difference is particularly stark when the population is skewed and the sample size is very small. Tim Hesterberg published a much more comprehensive investigation of multiple classical and bootstrap methods in 2014. He found similar results for small samples, where the t-confidence interval outperformed the percentile bootstrap until the sample size was 35 or larger.
Teaching the percentile bootstrap without thoughtfully explaining the conditions, particularly as a replacement for classical methods, seems like one step forward and two steps back. The percentile bootstrap is nothing new, but its weaknesses remain largely unknown in the community. I find myself wrestling with several considerations whenever I think about this topic.
- The percentile bootstrap is a stepping stone. I don’t think the percentile bootstrap method should be taught as “the” bootstrap method. It’s too unreliable. The percentile bootstrap should be taught as a first step towards better methods and / or as a first tool for students to start exploring a wider range of analyses, e.g. of the median, standard deviation, and IQR.
- There are better bootstrap methods. Tim’s excellent paper found that the bootstrap t-interval is much more robust than the percentile bootstrap, and the bootstrap t-interval is even much more robust than the classical methods for small samples and skewed data. (Research opportunity #1)
- The bootstrap opens the door to more statistics. The reason why I remain bullish on the long term value of advanced bootstrap methods is that they ease the analysis of a wider range of statistics, such as the standard deviation and IQR.
- We need to establish appropriate conditions for the bootstrap. Every statistical tool fails in many ways, and we need to better understand when methods fail before they are taught to the next generation of statisticians. As a starting point, I suggest a rule of thumb for the percentile bootstrap below. To be clear, more thoughtful work is required here and appropriate conditions are far from settled. (Research opportunity #2)
- Shifts in pedagogy are costly, so let’s do our homework first. A shift in how intro statistics is taught on a large scale is very expensive. It requires teaching tens of thousands of teachers the new pedagogy, getting those teachers to buy into the change, and then pushing schools (or students) to buy new textbooks. I think we owe it to schools, teachers, and most of all students to have solid evidence (data!) from a diverse set of studies showing the practical benefits of the bootstrap method before we ask them to incur the costs of this transition. (Research opportunity #3)
I want to wrap up with my rule of thumb for the percentile bootstrap. If I’d be comfortable applying the Z-test or Z-confidence interval to a data set, then I think it’s safe to use the percentile bootstrap for the mean or median. In most introductory courses, that usually means (1) the data are from a simple random sample or from random assignment in an experiment, (2) there are at least 30 observations in the sample, and (3) the distribution is not too strongly skewed.