My approach to the conceptual hurdle of using a single sample to mimic
the population runs something along the following lines:
First get them comfortable with the idea that a statistic measured on
the sample (usually we talk about mean) is our best estimate of the
corresponding population parameter. The goal is to make the link
between sample properties and population properties.
Then present the problem:
Problem: This sample statistic can move around depending on the sample.
How variable is it? How big is this sample variability? Since the
population parameter is fixed, we want to get an idea of how close our
'best estimate' is.
Ideal solution: keep taking samples from the population and get a
collection of statistics. Then we'd know because we'd have a sampling
distribution (a distribution of sample statistics).
Problem: This is impractical. Usually we can only afford to take one
sample.
Question: is there a way we can mimic/simulate this ideal solution?
Problem: We don't know what the population actually looks like.
But we do know something about it. In fact, everything we know about it
is encapsulated in the sample. The sample is our best estimate of the
population in a particular way: we expect the population to be much
bigger in size, but the frequencies of each value in the sample are
representative (in a statistical way) of the frequencies of each value
in the actual population. And we already have it in hand.
So instead of doing the impractical - continually using scarce resources
to sample from the population and calculate our statistic on each sample
- we use what we already have and sample from our best estimate of the
population - our sample itself - and calculate our statistic from each
bootstrap sample.
For my students, the key is when they understand that the sample itself
plays the role of an estimate of the population. And that we use
bootstrapping to study the variability (not the location) of our
statistic of interest.
- Scott Rifkin
------------
EBE, Division of Biological Sciences
UCSD
Hello All,
As person who spends most of her summer working with high school
teachers on stats and probability content and creating lesson plans,
which are used in the next school year, I've followed this discussion
eagerly.
High school teachers are relatively easily convinced that a large
enough, random sample is usually representative of the population.
Convincing teachers that one of these samples could be used to mimic
the entire population and then be utilized to generate more random
samples is quite a different thing. I am convinced of the
bootstrapping process, but to leap there immediately with teachers
versus the more cumbersome routes discussed in this chain of responses
might cause serious distress.
Are there resources to help educate high school teachers (and myself
further) in regard to bootstrapping? Research and experience shows
that teachers with either omit or superficially enact contact that
they feel is beyond their current knowledge base.
Simulation, in general, has been daunting for high school teachers.
Of 23 we worked with last summer, only 25% took the plunge with
re-randomization. However, the ones that did, thoroughly enjoyed the
experience, as did their students.
Best,
Maryann
/----------------------------------------/
/Maryann E. Huey/
/Mathematics and Computer Science/
/Drake University/
/515/271-2839///
<46A78617-9B87-4C74-8821-5F91750143B1[6].png>