How to estimate parameters: To bootstrap or not? - SBI

17 May 2015

I believe that Scott Rifkin has it exactly right with the bootstrap. I have used the
approach that he described with my students in a first college course in Statistical
Science. In the last week of the course just completed, my students worked in teams of
three using resampling and bootstrapping. The focus, of course, is on understanding the
variability of as estimate derived from a sample. 

I believe that, at a basic level, students can understand and appreciate the bootstrap. 
Of course there are subtleties, but then that's true in most real statistical
problems!  For those who want to learn more about the subtleties and the performance of
various procedures that grow out of bootstrapping, see the excellent recent paper, written
for teachers, by Tim Hesterberg. I learned a lot from it, and my students verified some of
the points that Tim makes for bootstrap confidence intervals. 

John Emerson
Middlebury, VT

Date: Sat, 16 May 2015 09:56:55 -0700
From: Scott Rifkin &lt;sarifkin(a)ucsd.edu&gt;
To: sbi(a)causeweb.org
Subject: [SBI]  How to estimate parameters: To bootstrap or not?
Message-ID: &lt;555776D7.7020209(a)ucsd.edu&gt;
Content-Type: text/plain; charset=utf-8; format=flowed

My approach to the conceptual hurdle of using a single sample to mimic the population runs
something along the following lines:

First get them comfortable with the idea that a statistic measured on the sample (usually
we talk about mean) is our best estimate of the corresponding population parameter.  The
goal is to make the link between sample properties and population properties.

Then present the problem:

Problem: This sample statistic can move around depending on the sample. 
  How variable is it? How big is this sample variability?  Since the population parameter
is fixed, we want to get an idea of how close our 'best estimate' is.

Ideal solution:  keep taking samples from the population and get a collection of
statistics.  Then we'd know because we'd have a sampling distribution (a
distribution of sample statistics).

Problem: This is impractical.  Usually we can only afford to take one sample.

Question: is there a way we can mimic/simulate this ideal solution?

Problem: We don't know what the population actually looks like.

But we do know something about it.  In fact, everything we know about it is encapsulated
in the sample.  The sample is our best estimate of the population in a particular way:  we
expect the population to be much bigger in size, but the frequencies of each value in the
sample are representative (in a statistical way) of the frequencies of each value in the
actual population.  And we already have it in hand.

So instead of doing the impractical - continually using scarce resources to sample from
the population and calculate our statistic on each sample
- we use what we already have and sample from our best estimate of the population - our
sample itself - and calculate our statistic from each bootstrap sample.

For my students, the key is when they understand that the sample itself plays the role of
an estimate of the population.  And that we use bootstrapping to study the variability
(not the location) of our statistic of interest.

- Scott Rifkin

------------
EBE, Division of Biological Sciences
UCSD

...

 Hello All,

 As person who spends most of her summer working with high school 
 teachers on stats and probability content and creating lesson plans, 
 which are used in the next school year, I've followed this discussion 
 eagerly.

 High school teachers are relatively easily convinced that a large 
 enough, random sample is usually representative of the population.
  Convincing teachers that one of these samples could be used to mimic 
 the entire population and then be utilized to generate more random 
 samples is quite a different thing.  I am convinced of the 
 bootstrapping process, but to leap there immediately with teachers 
 versus the more cumbersome routes discussed in this chain of responses 
 might cause serious distress.

 Are there resources to help educate high school teachers (and myself
 further) in regard to bootstrapping?  Research and experience shows 
 that teachers with either omit or superficially enact contact that 
 they feel is beyond their current knowledge base.

 Simulation, in general, has been daunting for high school teachers.
 Of 23 we worked with last summer, only 25% took the plunge with
 re-randomization.   However, the ones that did, thoroughly enjoyed the
 experience, as did their students.

 Best,

 Maryann

 /----------------------------------------/

 /Maryann E. Huey/

 /Mathematics and Computer Science/

 /Drake University/

 /515/271-2839///

 <46A78617-9B87-4C74-8821-5F91750143B1[6].png>

------------------------------

Message: 2
Date: Sat, 16 May 2015 19:27:17 -0400
From: Daren Starnes &lt;dstarnes(a)lawrenceville.org&gt;
To: Simulation-Based Inference &lt;sbi(a)causeweb.org&gt;
Subject: Re: [SBI] How to estimate parameters: To bootstrap or not?
Message-ID:
	&lt;CAMo0yhrLr_9y6JKxO6s9cv9jAM-T-pGVH0Kw3LmOT5c9bihHkg(a)mail.gmail.com&gt;
Content-Type: text/plain; charset="utf-8"

Hi, Maryann.  From my own work with high school teachers, I have found that the best entry
point for simulation-based inference is to introduce them to two cases that are pretty
accessible:

1. Using simulation to test a claim about a population proportion based on a random sample
from that population.  Just simulate many, many samples of that size under the assumption
that the claim is true and record the value of the sample proportion for each one in a
dotplot.  Then look where the observed result falls in the simulated sampling
distribution, and ask whether the sample result is sufficiently surprising (far out in the
tails of the distribution) to provide convincing evidence against the claim.
Ideally, we'd have learners do this with a spinner or some other physical device first
before proceeding to technology, which would necessitate using a fairly small sample size
for practical reasons.

2. Using simulation to determine whether the difference between two proportions is
statistically significant in a randomized experiment.
Assume that there is no difference in the effects of the two treatments on the subjects in
the study (null hypothesis).  Simulate re-doing the random assignment of subjects to
treatments many, many times, keeping each subject's response (success or failure) the
same as it was in the original experiment.  Each time, record the difference in
proportions of successes for the two groups on a dotplot. Then look where the observed
result falls in the simulated randomization distribution, and ask whether the observed
difference in proportions is sufficiently surprising (far out in the tails of the
distribution) to provide convincing evidence against the null hypothesis.  Ideally,
we'd have learners do this with by shuffling and dealing cards or some other physical
device first before proceeding to technology, which would necessitate using fairly small
group sizes for practical reasons.

There are great resources available from several members of this list that could be used
as the basis for these two distinct activities that would introduce teachers to the
different scope of inference for random sampling and randomized experiments.

Daren Starnes

**********************************************************************************************

Hello All,

As person who spends most of her summer working with high school teachers on stats and
probability content and creating lesson plans, which are used in the next school year,
I've followed this discussion eagerly.

High school teachers are relatively easily convinced that a large enough,
random sample is usually representative of the population.    Convincing
teachers that one of these samples could be used to mimic the entire population and then
be utilized to generate more random samples is quite a different thing.  I am convinced of
the bootstrapping process, but to leap there immediately with teachers versus the more
cumbersome routes discussed in this chain of responses might cause serious distress.

Are there resources to help educate high school teachers (and myself
further) in regard to bootstrapping?  Research and experience shows that teachers with
either omit or superficially enact contact that they feel is beyond their current
knowledge base.

Simulation, in general, has been daunting for high school teachers.   Of 23
we worked with last summer, only 25% took the plunge with re-randomization.
  However, the ones that did, thoroughly enjoyed the experience, as did their students.

Best,

Maryann

*----------------------------------------*

*Maryann E. Huey*

*Mathematics and Computer Science*

*Drake University*

*515/271-2839 <515%2F271-2839>*