How to estimate parameters: To bootstrap or not? - SBI

16 May 2015

My approach to the conceptual hurdle of using a single sample to mimic 
the population runs something along the following lines:

First get them comfortable with the idea that a statistic measured on 
the sample (usually we talk about mean) is our best estimate of the 
corresponding population parameter.  The goal is to make the link 
between sample properties and population properties.

Then present the problem:

Problem: This sample statistic can move around depending on the sample. 
  How variable is it? How big is this sample variability?  Since the 
population parameter is fixed, we want to get an idea of how close our 
'best estimate' is.

Ideal solution:  keep taking samples from the population and get a 
collection of statistics.  Then we'd know because we'd have a sampling 
distribution (a distribution of sample statistics).

Problem: This is impractical.  Usually we can only afford to take one 
sample.

Question: is there a way we can mimic/simulate this ideal solution?

Problem: We don't know what the population actually looks like.

But we do know something about it.  In fact, everything we know about it 
is encapsulated in the sample.  The sample is our best estimate of the 
population in a particular way:  we expect the population to be much 
bigger in size, but the frequencies of each value in the sample are 
representative (in a statistical way) of the frequencies of each value 
in the actual population.  And we already have it in hand.

So instead of doing the impractical - continually using scarce resources 
to sample from the population and calculate our statistic on each sample 
- we use what we already have and sample from our best estimate of the 
population - our sample itself - and calculate our statistic from each 
bootstrap sample.

For my students, the key is when they understand that the sample itself 
plays the role of an estimate of the population.  And that we use 
bootstrapping to study the variability (not the location) of our 
statistic of interest.

- Scott Rifkin

------------
EBE, Division of Biological Sciences
UCSD

...

 Hello All,

 As person who spends most of her summer working with high school
 teachers on stats and probability content and creating lesson plans,
 which are used in the next school year, I've followed this discussion
 eagerly.

 High school teachers are relatively easily convinced that a large
 enough, random sample is usually representative of the population.
  Convincing teachers that one of these samples could be used to mimic
 the entire population and then be utilized to generate more random
 samples is quite a different thing.  I am convinced of the
 bootstrapping process, but to leap there immediately with teachers
 versus the more cumbersome routes discussed in this chain of responses
 might cause serious distress.

 Are there resources to help educate high school teachers (and myself
 further) in regard to bootstrapping?  Research and experience shows
 that teachers with either omit or superficially enact contact that
 they feel is beyond their current knowledge base.

 Simulation, in general, has been daunting for high school teachers.
 Of 23 we worked with last summer, only 25% took the plunge with
 re-randomization.   However, the ones that did, thoroughly enjoyed the
 experience, as did their students.

 Best,

 Maryann

 /----------------------------------------/

 /Maryann E. Huey/

 /Mathematics and Computer Science/

 /Drake University/

 /515/271-2839///

 <46A78617-9B87-4C74-8821-5F91750143B1[6].png>