To compute statistical power for multiple regression we use Cohen’s effect size *f*^{2} which is defined by

*f*^{2} = .02 represents a small effect, *f*^{2} = .15 represents a medium effect and *f*^{2} = .35 represents a large effect.

To calculate the power of a multiple regression, we use the noncentral F distribution *F*(*df _{Reg}, df_{Res}, λ*) where

*df*,

_{Reg}= k*df*1 and the noncentral parameter

_{Res}= n − k −*λ*(see Noncentral F Distribution) is

**Example 1: **What is the power of a multiple regression on a sample of size 100 with 10 independent variables when α = .05?

We show the calculation in Figure 1.

**Figure 1 – Statistical Power**

**Real Statistics Functions**: The following functions are provided in the Real Statistics Pack:

**REG_POWER**(*effect, n, k, type, α, iter, prec*) = the power for multiple regression where *type* = 1 (default), *effect* = Cohen’s effect size *f*^{2 }and *n* = the sample size. If *type* = 2 then *effect* = the *R*^{2} effect size instead and if *type* = 0 then *effect* = the noncentrality parameter *λ*.

**REG_SIZE**(*effect, k*, 1−β*, type, α, iter, prec*) = the minimum sample size required to obtain power of at least 1−*β* (default .80) for multiple regression where *type* = 1 (default) and *effect* = Cohen’s effect size *f*^{2.} If *type* = 2 then *effect* = *R*^{2} instead.

Here *α* = significance level (default = .05). The calculation of the infinite sum for the noncentral F distribution stops when the level of precision exceeds* prec *(default 0.000000001) or the number of terms in the infinite sum exceeds *iter* (default 1,000).

We can therefore calculate the power for Example 1 using the formula

=REG_POWER(B8,B3,B4,2,B12)

Similarly we can calculate the power for Example 1 of Multiple Regression using Excel to be 99.9977% and the power for Example 2 of Multiple Regression using Excel to be 98.9361%.

**Example 2: **What is the size of the sample required to achieve 90% power for a multiple regression on 8 independent variables where *R*^{2} = .2, α = .05?

We see from Figure 2 that the sample size required is 85 and the actual power achieved is 90.26%.

**Figure 2 – Sample size required**

**Real Statistics Data Analysis Tool: **Statistical power and sample size can also be calculated using the **Power and Sample Size** data analysis tool.

For Example 1, we press **Ctrl-m** and double click on the **Power and Sample Size** data analysis tool. Next we select the **Multiple Regression** on the dialog box that appears as Figure 3.

**Figure 3 – Statistical Power and Sample Size dialog box**

Finally we fill in the dialog box that appears as shown in the upper part of Figure 4. When we press the **OK** button the results shown in the lower part of Figure 4 appear.

**Figure 4 – Multiple Regression Power dialog box**

Thank you for the clear and insightful articles here. I wondered what advice you have for conducting a multiple regression-type analysis but with unavoidably low sample sizes? My dataset appears to meet the other assumptions of regression, but has only 19 observations, with two independent variables I’d like to explore against a continuous dependent (actually multiple dependents but I will run each one separately). One independent variable is categorical, the other continuous. Any advice on the best course of action with small samples would be much appreciated! Thank you

Emily,

You can run multiple regression even with a small sample size. The small sample size will simply limit the power of the test.

Charles

Thank you Charles for your speedy reply!

Emily

Sir, This paper is very very helpful. But I am not understanding why – but when i am doing your example in excel (example 1), the last formula for calculating Beta is not coming – means – i am typing the formula NF_DIST, but it is showing ‘ERROR’.

Please help.

Anupam,

I don’t know what the cell value of ERROR means. Usually if there is an error, you would see one of the following #DIV/0, #N/A, #NUM!, #VALUE!, #NAME?, #NULL! or #REF!

What release of the Real Statistics software are you using? You can enter =VER() to find this out.

Charles

I created a spreadsheet using the values on this page, and downloaded the package. However, I get an incorrect value for NF-dist. Your sheet shows 0.208282. I get 1.05149E-5. Did I do something wrong? First numbers in each row below are my values, the second number is your example.

n 100 100

k 10 10

dfRes 89 89

dfReg 10 10

R-sq 0.4 0.4

f-sq 0.666666667 0.666666667

λ 66.66666667 66.66666667

α 0.05 0.05

F-crit 1.938791309 1.938791309

β 1.05149E-05 0.208282

1-β 0.999989485 0.999989485

Thanks for what you do.

David,

If you send me an Excel file with your data, I will try to figure out what is doing on.

Charles

Thanks. I sent you the spreadsheet. I have some more general questions which I include here:

Is the power calculation influenced by the use of stepwise regression, where there may be many more potential independent variables than are used in the final model?

This could be critical if you are including interaction terms.

For example, if there are ten independent variables, the interaction terms could include x1*x2, x1*x3, … x1*x2*x3, … all the way to Productsum(x(i)) i = 1…10. In this case, there are 1024 possible candidate “independent variables,” including the synthetic ones. Yet the final model might have only a few terms.

On one hand, we don’t want to be guilty of “p-hacking” by creating so many candidate terms. On the other hand, we don’t want to miss relationships that may exist in the data.

One could include multivariate polynomial terms such as x1*x3^2, x3*x5^-1, etc. Then there may be many more candidate terms. The website I linked to does this kind of calculation.

Regards, Dave.

Dave,

The problem is that there are an infinite number of possible terms to include (besides the ones you have mentioned, there are potentially LN(x1), exp(x1), x1^2, x1^3, x1^4, x1^x2, sin(x1), etc.). You need to use some judgement to determine which such terms are reasonable. Often there are some theoretical consideration, but sometimes you need to create a plot if the data to see which terms are likely to matter. Also you might do a little trial and error.

Charles