Statistical Power and Sample Size

As described in Null Hypothesis Testing, beta (β) is the acceptable level of type II error, i.e. the probability that the null hypothesis is not rejected even though it is false and power is 1 – β. We now show how to estimate the power of a statistical test.

Example 1: Suppose bolts are being manufactured using a process so that it is known that the length of the bolts follows a normal distribution with a standard deviation of 12 mm. The manufacturer wants to check that the mean length of their bolts is 60 mm, and so takes a sample of 110 bolts and uses a one-tail test with α = .05 (i.e. H0: µ ≤ 60). What is the probability of a type II error if the actual mean length is 62.5?

Since n = 110 and σ = 12, the standard error = \frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{110}} = 1.144. Let x = the length of the bolt. The null hypothesis is rejected provided the sample mean is greater than the critical value of x, which is NORM.INV(1 – α, μ, s.e.) = NORM.INV(.95, 60, 1.144) = 61.88.

Now suppose that the actual mean is 62.5. The situation is illustrated in Figure 1, where the curve on the left represents the normal curve being tested with a mean of μ0 = 60, and the normal curve on the right represents the real distribution with a mean of μ1 = 62.5.

Statistical power chartFigure 1 – Statistical power

Since
image5023

We have β = NORMDIST(61.88,62.5,1.144,TRUE) = .295, and so power = 1 – β = .705.

We can repeat this calculation for values of μ1 ≥ 62.5 to obtain the table and graph of the power values in Figure 2.

Statistical power graph

Figure 2 – Power curve for Example 1

Example 2: For the data in Example 1, answer the following questions:

  1. What is the power of the test for detecting a standardized effect of size .2?
  2. What effect size (and mean) can be detected with power .80?
  3. What sample size is required to detect an effect of size .2 with power .80?

a) As described in Standardized Effect Size, we use the following measure of effect size:

image486

Thus μ= 60 + (.2)(12) = 62.4. As in Example 1,

image5023

and so β = NORM.DIST(61.88, 62.4, 1.1144, TRUE) = .325, and so power = 1 – β = .675.

We summarize these calculations in the following worksheet:

Power effect sample size

Figure 3 – Determining power based on effect and sample size

b) We use Excel’s Goal Seek capability to answer the second question. Using the worksheet in Figure 3, we now select Data > Data Tools | What-If Analysis. In the dialog box that appears (see Figure 4) enter the following values

Dialog box goal seek

Figure 4 – Goal Seek dialog box

We are requesting that Excel find the value of cell B9 (the effect size) that produces a value of .8 for cell B12 (the power). Here the first entry must point to a cell that contains a formula. The second entry must be a value and the third entry must point to a cell that contains a value (possibly blank) and not a formula. After clicking on OK, a Goal Seek Status dialog box appears, and the worksheet from Figure 3 changes to that in Figure 5.

Power normal distribution Excel

Figure 5 – Determining detectable effect size for specified power

Note that the values of a number of cells have changed to reflect the value necessary to obtain power of .80. In particular, we see that the Effect size (cell B9) contains the value 0.23691. You must click on the OK button in the Goal Seek Status box to lock in these new values (or Cancel to return to the original worksheet values).

c) We again use Excel’s Goal Seek capability to answer the third question. Using the worksheet in Figure 3 (making sure that the effect size in cell B9 is set to .2), we now enter the following values in the dialog box that appears (see Figure 6):

Sample size requirement power

Figure 6 – Using Goal Seek to determine minimum sample size

After clicking on the OK button, the worksheet changes to that in Figure 7.

Sample size needed Excel

Figure 7 – Sample size requirement for Example 2

In particular, note that the sample size value in cell B6 changes to 154.486. Thus the required sample size is 155.

Observation: An alternative way of answering Example 2 (a) is as described in Figure 8.

Power effect size Excel

Figure 8 – Determining power for a given effect size

Observation: An alternative way of answering Example 2 (c) is as described in Figure 9. Note that this approach avoids the need for the Goal Seek capability.

Sample size effect size

Figure 9 – Determining sample size for a given effect size

33 thoughts on “Statistical Power and Sample Size”

  1. Hello Charles,
    I would like to perform analysis as shown in figure 3, but I wish to look for an effect size in the negative direction (mu1 less than mu0). How should I proceed? I’m sorry if you’ve already answered this question, but I looked through the comments and I’m still not sure how to go about it.

    Reply
  2. Dear Charles

    Hope you are doing well, could you please clarify the followings.

    In the real scenario, we are using multi-stage sampling (for example, first stage Probability Proportional to Size techniques and the second stage using cluster sampling techniques). So in many cases, each element of the sample does not have the same overall probability of selection (unless select equal number of elements is chosen in each cluster at the second stage of sample selection).
    My question, if it is a case (each element of the sample does not have the same overall probability of selection)
    • Shall we do the Z or t-test for this selected sample?

    My second query

    In the real survey, we are facing difficulties to estimate population parameter confidence interval when using multi-stage sampling (for example, first stage Probability Proportional to Size techniques and second stage using cluster sampling techniques) as facing difficulties to calculate the standard error. Could you please to suggest or circulate good guide that describes the equations for calculating estimator (sample mean, sample proportion, sample total) and its variances in the multi-stage sampling (example, PPS first stage and cluster/stratifies in the second stage)

    To overcome this problem shall we use the self-weighing technique?

    Whenever possible, clusters should be chosen with probability-proportional-to-size in sample surveys at the first stage.

    A second is that, if an equal number of elements is chosen in each cluster at the second stage of sample selection, the end result will be a sample in which each element has the same overall probability of selection, or is self-weighting.

    Then we can estimate population parameter and its variance using the same technique of simple random sampling without replacement as each element of the sample is having equal overall probability of selection.

    In this case, shall we use Z or t-test for this sample?

    Reply
  3. Dear Sir

    Hope you are doing well, I want to ask a clarification when your time permit, please throw some light on it.

    Which is the best way to estimate the (population) parameter?

    1. Calculate the required sample size by defining Z-score (95%, 1-96), error (example 0, 03), and p (say .5 for maximum sample size) then estimate the sample statistic (example sample proportion). Then we say the calculated sample proportion is an unbiased estimator of the population proportion and 95% confidence the population proportion lies within plus or minus 0.03 (this value was used for calculating sample size) of the sample proportion. That is,

    p- 0.03=< P <= p + 0.03

    Or

    We take a small sample (not calculate sample size statistically, say 40) due to limitation but using sampling techniques (srs, cluster or ..) while selecting a sample, then calculate the sample proportion after that and its variance (using statistical techniques). Finally, we say population proportion-P lies between p + – Z SE(p). That is,

    p- Z[SE(p)] =< P <= p + Z [SE(p)]

    Please clarify it, when your time permits.

    Reply
  4. Charles,
    “Figure 1–Statistical power” helped me understand type II error for a 1-tail comparison. This calculation requires an alternative distribution with a mean of mu1.
    I know the type II error (beta) can be calculated for a 2-tail comparison, but I don’t understand what it means. Is the null distribution compared to two alternative distributions having means of mu1 and -mu1? I would really appreciate an illustrated example of finding beta for a 2-tail t-test analysis.

    Reply
  5. Thanks for this.

    I think for figure 2, the values of μ1 are from “61.8819676776998 to 64.4” rather than “μ1 ≥ 62.5”. I found this confusing until I realised this..

    Reply
  6. Hi Charles,

    I have a more philosipoical question.
    As I understand the “standardize effect size” concept, it takes the change you want to identify and normalized to a measure of ability to be detectable.
    I don’t understand why is it interesting?

    you wrote “Since it is standardized we can compare the effects across different studies with different variables” can you please give an interesting example?

    In your example you expect the length of the bolt to be 60mm , and maybe 60±1 is okay and more or less can’t be sold in the shops. or 60±1.66%

    So I’m interested what is the power of the test to identify bolts longer than 61 or shorter than 59. not what is the power of the test to identify cohen’s d=0.2

    maybe in your example, you changed machine and now you compare the effect on the average length to the effect on the average diameter???

    I hope it is okay I’m asking many questions.

    Thanks a lot,
    Oren

    Reply
    • Oren,
      1. The effect size is interesting since you want to quantify the effect (small, large, etc.). E.g. if you are measuring the effectiveness of a new drug for curing a type of cancer, you prefer a bigger effect size than that of the currently used drug or a placebo.
      When you do statistical analysis often you are looking to see whether an effect is statistically significant (using the p-value). This just means that the effect size is different from zero (or some other predesignated value), not whether you should care about the effect. Also as the sample size gets larger and larger it is very likely that you will see a significant result — even if the effect size is very small.
      2. You need to look at the literature in your field to see interesting examples of comparisons of effect sizes across different studies, but the example I gave above indicates what to look for. If the effect size of the currently used treatment over a placebo is .30 and your new treatment has an effect size of .70 over a placebo, this will be interesting.
      3. Bolt size (i.e. mean bolt size) can easily be mapped into an effect size
      Charles

      Reply
  7. Hello Charles,
    why is in the first formula, when you calculate lenght for alpha=0,05 used SEM and not SD? According normal distribution, the 95% of data are within mean +- 2 SD, so it should be =norm.inv(0.95,60,12).

    Reply
    • I am not exactly sure which is the first formula that you are referring to, but if it is the effect size formula, then Cohen’s d uses the standard deviation and not the standard error. d does not depend on the sample size.
      Charles

      Reply
        • Vendula,
          Yes, for this problem, the appropriate value for the standard deviation for a sample of size 110 is 1.44, which is the standard error for the sample. 12 represents the standard deviation of the population.
          Charles

          Reply
  8. Hi Charles,

    I’m interested in how you’d compute beta for observed values that aren’t greater than Xcrit.

    For example, if the observed value was 60.5 (less than Xcrit) would the beta be equal to NORMDIST(61.88, 60.5, 1.144, TRUE) = 0.886148, and the beta would be higher the smaller the number gets.

    Also, if you were doing a right tail test and the observed value was less than Xcrit, such as NORMDIST(58.12, 58, 1.144, TRUE) = 0.5412.

    Thanks

    Reply
  9. Hello,

    If I have a sample with a mean of 1000 and SEM (standar error) of 60 and other sample with a mean of 800 and SEM – 70, how would I calculate the statistical power between these two samples?

    Thank you

    Reply
  10. Hello Charles,

    I need assistance with how to plug in the numbers for the Statistical Power and Sample Size option. I will be running a logistic regression. I have all the data, but am unsure as to what I input.

    Any insight you have would be great! Thank you.

    Reply
    • Angela, sorry but the Statistical Power and Sample Size data analysis tool supports linear regression but does not yet support logistic regression.
      Charles

      Reply
  11. Hi Charles

    I am doing an evaluation research survey. Kindly tell me how to decide the sample size for rural and urban area, with formula for a study on immunization coverage with the previous coverage evaluation survey indicates a rural coverage percentage at 50 % and urban 68 %. Is it ok to do it with the formula n = 4 pq /L?

    Reply
    • I haven’t enough information to answer your question. Which statistical test are you using? What does pq/L abbreviate?
      Charles

      Reply
  12. I am just about conducting a survey in Ghana on the informal sector workers. The Ghanaian economy is about 84 % informal and over 14 million Ghanaians are currently working. How do I get the right sample size (using power sampling) for the whole country? Thanks.

    Reply
    • George,
      The sample size required depends on the type of statistical test that you are going to use. You need to identify the test that you will use (or that you are considering using) before you can estimate the sample size.
      Charles

      Reply
  13. How to amend formula when μ0 ˃ μ1 ? It looks to me as there will be no difference, which subtract from what, since from critical value point of view μ1+z*σ = μ0+z*σ. Thus, one should simply swap them.
    Do I understand correctly? I would be glad for help.
    Thank you in advance,
    Paul

    Reply

Leave a Comment