Power is the probability of rejecting the null hypothesis given the parameter value.  For a simple null hypothesis, the power when the null is true is the size of the test.  If the null hypothesis is true, the probability of rejecting it should be small so low power is desired.  If the null is false, the power (the probability of rejecting the null hypothesis) should be as large as possible.  Power is a function of the parameter under consideration.  By finding the power of a test for several values of the parameter, a power curve can be constructed.  To examine the idea of power more fully, a simulation based on Example 3.1, page 60, in Dowdy and Wearden (1991), will be conducted.

In nature, there is approximately a one to one ratio of female to male births.  Dairy farmers, among others, need more females than males.  Therefore, many experiments have been performed to try and alter the sex ratio.  A reproductive physiologist believes that by treating the semen of the bull with a mild acid and using artificial insemination, (s)he can change the sex ratio of calves.  (S)he decides to perform an experiment and observe 20 calves that have been produced by this method.  The physiologist does not know before the experiment what effect the acid will have on the sex of newborn calves, and (s)he wants to know what kind of power can be expected from the experiment.  (S)he also wants to know if the experiment has the specified size (i.e., if we set a at a certain value, is the probability of making a Type I error really a in practice?).

State the null hypothesis, in words and in symbols.  Let p = probability of a female calf (i.e. count a female as a “success.”)

 

State the alternative hypothesis, in words and in symbols.  (Remember that the physiologist doesn’t know what effect the acid will have.)

 

If we want the size of the test (a) to be as close to 0.10 as possible, what is the rejection region?   Use the binomial probability table.

 

 

What is the actual a-value?

 

Next, we’ll see if the test has the correct size in practice.  Obviously, we can’t actually conduct an experiment.  Also, conducting the experiment once won’t tell us if the size is actually close to a.  We can, however, simulate some results under the assumption that H0 is true.  If we conduct a similar, simple experiment than can be carried out many times in the lab, we can get an idea of the size of the test in practice.  The experiment is carried out for p = 0.5 (the null hypothesis) many times, and the proportion of times we can reject the null hypothesis is the estimated size of the test.  We will be carrying out a small experiment 25 times.

IF H0 IS TRUE, how many times should you expect to reject H0?

 

 

Each group should have one 10-sided die.  Appoint one person to roll the die twenty times. Another group member will record the result.  If the result is 0-4, count this as a success.  If the result is 5-9, count this as a failure (this gives P(success) = P(failure) = 0.50).  These twenty rolls constitute one run of the “experiment.”  Based on the twenty trials (which represent the twenty calves), what is the decision?  Repeat the experiment 24 more times (i.e. carry out a total of 25 tests of H0: p = 0.50).  A tabulation sheet is provided.

How many times out of 25 did you reject H0?

 

What is the estimated size of the test?

 

Should the physiologist be satisfied that the test has size close to the specified a?

 

 

In reality, the probability of a female calf is probably not exactly 0.5.  If the true value of p is close to 0.5, we expect lower power than if the true p is very different from the hypothesized value.  Each group will be assigned a specific true value of p. Your group’s true value is p = ______.  Using the binomial table, find the power of the test under your group’s p  value.

The power is:

 

 

This power means that out of 25 tests, we should expect to reject H0 how many times for this true value of p?

 

Now, we will carry out another simulation study to examine the power of the test in practice.  We will do this by carrying out the same simulation study as before, but using a different probability of success.  This means that the definitions of success and failure will change. Your true value is p = _______.  For this p value, the P(success) = _______.  To achieve this P(success), what will you count as a success and what will you count as a failure? Success = _______, failure = _______.  Roll one die 20 times.  Under the null hypothesis that p = 0.50, what is the decision? Repeat the experiment 24 more times.  Again, a tabulation sheet is provided.

Out of 25 tests, how many times did you reject the null hypothesis that p= 0.50?

 

What is the estimated power of the test?

 

How close is the estimated power to the theoretical power you found from the binomial table?

 

Pool your data with the other groups in the class.  Construct a theoretical power curve and an estimated power curve.

 

 

 

 

 

Are the curves different?  Why do you think they are different?  What do you think you could do to make them more alike?

 

 

What kind of problems did you encounter carrying out the simulation study?  Did you always have the same person rolling the die?  Do you think this made a difference?