Statistical Power and Sample Size

As described in Null Hypothesis Testing, beta (β) is the acceptable level of type II error, i.e. the probability that the null hypothesis is not rejected even though it is false and power is 1 – β. We now show how to estimate the power of a statistical test.

Example 1: Suppose bolts are being manufactured using a process so that it is known that the length of the bolts follows a normal distribution with a standard deviation of 12 mm. The manufacturer wants to check that the mean length of their bolts is 60 mm, and so takes a sample of 110 bolts and uses a one-tail test with α = .05 (i.e. H₀: µ ≤ 60). What is the probability of a type II error if the actual mean length is 62.5?

Since n = 110 and σ = 12, the standard error = $\frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{110}}$ = 1.144. Let x = the length of the bolt. The null hypothesis is rejected provided the sample mean is greater than the critical value of x, which is NORM.INV(1 – α, μ, s.e.) = NORM.INV(.95, 60, 1.144) = 61.88.

Now suppose that the actual mean is 62.5. The situation is illustrated in Figure 1, where the curve on the left represents the normal curve being tested with a mean of μ₀ = 60, and the normal curve on the right represents the real distribution with a mean of μ₁ = 62.5.

Statistical power chart Figure 1 – Statistical power

Since

We have β = NORMDIST(61.88,62.5,1.144,TRUE) = .295, and so power = 1 – β = .705.

We can repeat this calculation for values of μ₁ ≥ 62.5 to obtain the table and graph of the power values in Figure 2.

Figure 2 – Power curve for Example 1

Example 2: For the data in Example 1, answer the following questions:

What is the power of the test for detecting a standardized effect of size .2?
What effect size (and mean) can be detected with power .80?
What sample size is required to detect an effect of size .2 with power .80?

a) As described in Standardized Effect Size, we use the following measure of effect size:

Thus μ₁= 60 + (.2)(12) = 62.4. As in Example 1,

and so β = NORM.DIST(61.88, 62.4, 1.1144, TRUE) = .325, and so power = 1 – β = .675.

We summarize these calculations in the following worksheet:

Power effect sample size

Figure 3 – Determining power based on effect and sample size

b) We use Excel’s Goal Seek capability to answer the second question. Using the worksheet in Figure 3, we now select Data > Data Tools | What-If Analysis. In the dialog box that appears (see Figure 4) enter the following values

Dialog box goal seek

Figure 4 – Goal Seek dialog box

We are requesting that Excel find the value of cell B9 (the effect size) that produces a value of .8 for cell B12 (the power). Here the first entry must point to a cell that contains a formula. The second entry must be a value and the third entry must point to a cell that contains a value (possibly blank) and not a formula. After clicking on OK, a Goal Seek Status dialog box appears, and the worksheet from Figure 3 changes to that in Figure 5.

Power normal distribution Excel

Figure 5 – Determining detectable effect size for specified power

Note that the values of a number of cells have changed to reflect the value necessary to obtain power of .80. In particular, we see that the Effect size (cell B9) contains the value 0.23691. You must click on the OK button in the Goal Seek Status box to lock in these new values (or Cancel to return to the original worksheet values).

c) We again use Excel’s Goal Seek capability to answer the third question. Using the worksheet in Figure 3 (making sure that the effect size in cell B9 is set to .2), we now enter the following values in the dialog box that appears (see Figure 6):

Sample size requirement power

Figure 6 – Using Goal Seek to determine minimum sample size

After clicking on the OK button, the worksheet changes to that in Figure 7.

Sample size needed Excel

Figure 7 – Sample size requirement for Example 2

In particular, note that the sample size value in cell B6 changes to 154.486. Thus the required sample size is 155.

Observation: An alternative way of answering Example 2 (a) is as described in Figure 8.

Power effect size Excel

Figure 8 – Determining power for a given effect size

Observation: An alternative way of answering Example 2 (c) is as described in Figure 9. Note that this approach avoids the need for the Goal Seek capability.

Figure 9 – Determining sample size for a given effect size

33 thoughts on “Statistical Power and Sample Size”

Paul

February 28, 2021 at 3:21 pm

Hello Charles,
I would like to perform analysis as shown in figure 3, but I wish to look for an effect size in the negative direction (mu1 less than mu0). How should I proceed? I’m sorry if you’ve already answered this question, but I looked through the comments and I’m still not sure how to go about it.
Reply
- Charles
  
  March 1, 2021 at 9:06 am
  
  Paul,
  It doesn’t matter which direction is required, the calculation is the same. Of course, the direction you are testing for is towards the alternative hypothesis (which is where power makes sense).
  You can use the following Real Statistics tool to calculate power.
  https://www.real-statistics.com/hypothesis-testing/real-statistics-power-data-analysis-tool/
  Charles
  Reply
Valavan Vamathevan

December 30, 2019 at 7:27 pm

Dear Charles

Hope you are doing well, could you please clarify the followings.

In the real scenario, we are using multi-stage sampling (for example, first stage Probability Proportional to Size techniques and the second stage using cluster sampling techniques). So in many cases, each element of the sample does not have the same overall probability of selection (unless select equal number of elements is chosen in each cluster at the second stage of sample selection).
My question, if it is a case (each element of the sample does not have the same overall probability of selection)
• Shall we do the Z or t-test for this selected sample?

My second query

In the real survey, we are facing difficulties to estimate population parameter confidence interval when using multi-stage sampling (for example, first stage Probability Proportional to Size techniques and second stage using cluster sampling techniques) as facing difficulties to calculate the standard error. Could you please to suggest or circulate good guide that describes the equations for calculating estimator (sample mean, sample proportion, sample total) and its variances in the multi-stage sampling (example, PPS first stage and cluster/stratifies in the second stage)

To overcome this problem shall we use the self-weighing technique?

Whenever possible, clusters should be chosen with probability-proportional-to-size in sample surveys at the first stage.

A second is that, if an equal number of elements is chosen in each cluster at the second stage of sample selection, the end result will be a sample in which each element has the same overall probability of selection, or is self-weighting.

Then we can estimate population parameter and its variance using the same technique of simple random sampling without replacement as each element of the sample is having equal overall probability of selection.

In this case, shall we use Z or t-test for this sample?
Reply
- Charles
  
  January 3, 2020 at 3:13 pm
  
  I haven’t looked into these issues yet. I assume that whether you use the z test or t test, the main issue is how to estimate the variance so that you can estimate the effect size desired. In any case, the following article may be helpful.
  http://ocw.jhsph.edu/courses/StatMethodsForSampleSurveys/PDFs/Lecture5.pdf
  Charles
  Reply
  - Valavan Vamathevan
    
    January 5, 2020 at 8:01 pm
    
    Dear Sir
    
    Thank you so much.
    
    Kind regards
    V.Valavan
    Reply
Valavan Vamathevan

November 25, 2019 at 3:17 pm

Dear Sir

Hope you are doing well, I want to ask a clarification when your time permit, please throw some light on it.

Which is the best way to estimate the (population) parameter?

1. Calculate the required sample size by defining Z-score (95%, 1-96), error (example 0, 03), and p (say .5 for maximum sample size) then estimate the sample statistic (example sample proportion). Then we say the calculated sample proportion is an unbiased estimator of the population proportion and 95% confidence the population proportion lies within plus or minus 0.03 (this value was used for calculating sample size) of the sample proportion. That is,

p- 0.03=< P <= p + 0.03

Or

We take a small sample (not calculate sample size statistically, say 40) due to limitation but using sampling techniques (srs, cluster or ..) while selecting a sample, then calculate the sample proportion after that and its variance (using statistical techniques). Finally, we say population proportion-P lies between p + – Z SE(p). That is,

p- Z[SE(p)] =< P <= p + Z [SE(p)]

Please clarify it, when your time permits.
Reply
- Charles
  
  November 25, 2019 at 6:17 pm
  
  Valavan,
  Although it depends on exactly what hypothesis you are trying to test and how feasible it is to obtain a sufficiently large sample, the first approach is usually better.
  Charles
  Reply
  - Valavan Vamathevan
    
    November 25, 2019 at 6:47 pm
    
    Thank you very much, sir
    Reply
Dave Powelson

August 29, 2019 at 8:58 pm

Charles,
“Figure 1–Statistical power” helped me understand type II error for a 1-tail comparison. This calculation requires an alternative distribution with a mean of mu1.
I know the type II error (beta) can be calculated for a 2-tail comparison, but I don’t understand what it means. Is the null distribution compared to two alternative distributions having means of mu1 and -mu1? I would really appreciate an illustrated example of finding beta for a 2-tail t-test analysis.
Reply
- Charles
  
  August 30, 2019 at 10:37 am
  
  Hi Dave,
  It is difficult to understand these concepts without looking at specific examples. I suggest that you look at
  https://real-statistics.com/sampling-distributions/statistical-power-sample/
  https://real-statistics.com/students-t-distribution/statistical-power-of-the-t-tests/
  Charles
  Reply
  - Dave Powelson
    
    September 4, 2019 at 7:18 pm
    
    Thanks for the links. I had to learn how to use the noncentral t distribution function NT_DIST, but I think I can visualize the process now. The way I would describe beta for a 2-tail analysis is the area of the alternative t distribution INSIDE the right and left t critical values obtained from the null t distribution.
    Reply
Rob Connell

May 24, 2018 at 2:21 am

Thanks for this.

I think for figure 2, the values of μ1 are from “61.8819676776998 to 64.4” rather than “μ1 ≥ 62.5”. I found this confusing until I realised this..
Reply
- Charles
  
  June 6, 2018 at 11:22 am
  
  Rob,
  Sorry about the confusion. I wanted to show the power curve starting from .50. This is clearly labelled in Figure 2.
  Charles
  Reply
Oren Ben Harim

March 20, 2018 at 1:30 am

Hi Charles,

I have a more philosipoical question.
As I understand the “standardize effect size” concept, it takes the change you want to identify and normalized to a measure of ability to be detectable.
I don’t understand why is it interesting?

you wrote “Since it is standardized we can compare the effects across different studies with different variables” can you please give an interesting example?

In your example you expect the length of the bolt to be 60mm , and maybe 60±1 is okay and more or less can’t be sold in the shops. or 60±1.66%

So I’m interested what is the power of the test to identify bolts longer than 61 or shorter than 59. not what is the power of the test to identify cohen’s d=0.2

maybe in your example, you changed machine and now you compare the effect on the average length to the effect on the average diameter???

I hope it is okay I’m asking many questions.

Thanks a lot,
Oren
Reply
- Charles
  
  March 21, 2018 at 9:46 am
  
  Oren,
  1. The effect size is interesting since you want to quantify the effect (small, large, etc.). E.g. if you are measuring the effectiveness of a new drug for curing a type of cancer, you prefer a bigger effect size than that of the currently used drug or a placebo.
  When you do statistical analysis often you are looking to see whether an effect is statistically significant (using the p-value). This just means that the effect size is different from zero (or some other predesignated value), not whether you should care about the effect. Also as the sample size gets larger and larger it is very likely that you will see a significant result — even if the effect size is very small.
  2. You need to look at the literature in your field to see interesting examples of comparisons of effect sizes across different studies, but the example I gave above indicates what to look for. If the effect size of the currently used treatment over a placebo is .30 and your new treatment has an effect size of .70 over a placebo, this will be interesting.
  3. Bolt size (i.e. mean bolt size) can easily be mapped into an effect size
  Charles
  Reply
Vendula

November 7, 2016 at 10:20 pm

Hello Charles,
why is in the first formula, when you calculate lenght for alpha=0,05 used SEM and not SD? According normal distribution, the 95% of data are within mean +- 2 SD, so it should be =norm.inv(0.95,60,12).
Reply
- Charles
  
  November 9, 2016 at 12:01 pm
  
  I am not exactly sure which is the first formula that you are referring to, but if it is the effect size formula, then Cohen’s d uses the standard deviation and not the standard error. d does not depend on the sample size.
  Charles
  Reply
  - Vendula
    
    November 10, 2016 at 9:20 pm
    
    I was speaking about example 1, when you calculate alpha a and beta, you used SE =1.44 not SD =12
    Reply
    - Charles
      
      November 23, 2016 at 10:14 am
      
      Vendula,
      Yes, for this problem, the appropriate value for the standard deviation for a sample of size 110 is 1.44, which is the standard error for the sample. 12 represents the standard deviation of the population.
      Charles
      Reply
Jonathan Bechtel

May 10, 2016 at 4:57 pm

Hi Charles,

I’m interested in how you’d compute beta for observed values that aren’t greater than Xcrit.

For example, if the observed value was 60.5 (less than Xcrit) would the beta be equal to NORMDIST(61.88, 60.5, 1.144, TRUE) = 0.886148, and the beta would be higher the smaller the number gets.

Also, if you were doing a right tail test and the observed value was less than Xcrit, such as NORMDIST(58.12, 58, 1.144, TRUE) = 0.5412.

Thanks
Reply
Amadora

January 31, 2016 at 10:44 pm

Hello,

If I have a sample with a mean of 1000 and SEM (standar error) of 60 and other sample with a mean of 800 and SEM – 70, how would I calculate the statistical power between these two samples?

Thank you
Reply
- Amadora
  
  January 31, 2016 at 10:46 pm
  
  If you could explain how to to solve it using both excel and spss it would be perfect!! thank you
  Reply
  - Charles
    
    February 1, 2016 at 7:35 am
    
    I don’t use SPSS and so won’t comment about SPSS. See response to your other comment regarding Excel.
    Charles
    Reply
- Charles
  
  February 1, 2016 at 7:38 am
  
  You also need to know the sample size. See the following webpage for details:
  Power of t test
  Charles
  Reply
Angela

September 5, 2015 at 12:08 am

Hello Charles,

I need assistance with how to plug in the numbers for the Statistical Power and Sample Size option. I will be running a logistic regression. I have all the data, but am unsure as to what I input.

Any insight you have would be great! Thank you.
Reply
- Charles
  
  September 5, 2015 at 9:35 am
  
  Angela, sorry but the Statistical Power and Sample Size data analysis tool supports linear regression but does not yet support logistic regression.
  Charles
  Reply
Rads

July 16, 2015 at 5:59 pm

Hi Charles

I am doing an evaluation research survey. Kindly tell me how to decide the sample size for rural and urban area, with formula for a study on immunization coverage with the previous coverage evaluation survey indicates a rural coverage percentage at 50 % and urban 68 %. Is it ok to do it with the formula n = 4 pq /L?
Reply
- Charles
  
  July 19, 2015 at 9:45 pm
  
  I haven’t enough information to answer your question. Which statistical test are you using? What does pq/L abbreviate?
  Charles
  Reply
George Domfe

February 11, 2015 at 2:58 pm

I am just about conducting a survey in Ghana on the informal sector workers. The Ghanaian economy is about 84 % informal and over 14 million Ghanaians are currently working. How do I get the right sample size (using power sampling) for the whole country? Thanks.
Reply
- Charles
  
  February 11, 2015 at 5:47 pm
  
  George,
  The sample size required depends on the type of statistical test that you are going to use. You need to identify the test that you will use (or that you are considering using) before you can estimate the sample size.
  Charles
  Reply
Paul

January 10, 2015 at 3:52 am

How to amend formula when μ0 ˃ μ1 ? It looks to me as there will be no difference, which subtract from what, since from critical value point of view μ1+z*σ = μ0+z*σ. Thus, one should simply swap them.
Do I understand correctly? I would be glad for help.
Thank you in advance,
Paul
Reply
- Charles
  
  January 10, 2015 at 9:56 am
  
  Paul,
  Yes, you are correct. You can simply swap them.
  Charles
  Reply
  - Paul
    
    January 10, 2015 at 1:54 pm
    
    Thanks yet again Charles!
    Paul
    Reply

33 thoughts on “Statistical Power and Sample Size”

Leave a Comment Cancel reply