Statistical Power and Sample Size for Multiple Regression

To compute statistical power for multiple regression we use Cohen’s effect size f2 which is defined by

image9173

f2 = .02 represents a small effect, f2 = .15 represents a medium effect and f2 = .35 represents a large effect.

To calculate the power of a multiple regression, we use the noncentral F distribution F(dfReg, dfRes, λ) where dfReg = k, dfRes = n − k − 1 and the noncentral parameter λ (see Noncentral F Distribution) is

image9174

Example 1: What is the power of a multiple regression on a sample of size 100 with 10 independent variables when α = .05?

We show the calculation in Figure 1.

Statistical power multiple regression

Figure 1 – Statistical Power

Real Statistics Functions: The following functions are provided in the Real Statistics Pack:

REG_POWER(effect, n, k, type, α, m, prec) = the power for multiple regression where type = 1 (default), effect = Cohen’s effect size f2 and n = the sample size. If type = 2 then effect = the R2 effect size instead and if type = 0 then effect = the noncentrality parameter λ.

REG_SIZE(effect, k, 1−β, type, α, m, prec) = the minimum sample size required to obtain power of at least 1−β (default .80) for multiple regression where type = 1 (default) and effect = Cohen’s effect size f2. If type = 2 then effect = R2 instead.

Here α = significance level (default = .05). The calculation of the infinite sum for the noncentral F distribution stops when the level of precision exceeds prec (default 0.000000001) or the number of terms in the infinite sum exceeds m (default 1,000).

We can therefore calculate the power for Example 1 using the formula

=REG_POWER(B8,B3,B4,2,B12)

Similarly we can calculate the power for Example 1 of Multiple Regression using Excel to be 99.9977% and the power for Example 2 of Multiple Regression using Excel to be 98.9361%.

Example 2: What is the size of the sample required to achieve 90% power for a multiple regression on 8 independent variables where R2 = .2, α = .05?

We see from Figure 2 that the sample size required is 85 and the actual power achieved is 90.26%.

Required sample size regression

Figure 2 – Sample size required

Real Statistics Data Analysis Tool: Statistical power and sample size can also be calculated using the Power and Sample Size data analysis tool.

For Example 1, we press Ctrl-m and double click on the Power and Sample Size data analysis tool. Next we select the Multiple Regression on the dialog box that appears as Figure 3.

Regression power sample size

Figure 3 – Statistical Power and Sample Size dialog box

Finally we fill in the dialog box that appears as shown in the upper part of Figure 4. When we press the OK button the results shown in the lower part of Figure 4 appear.

Staistical power multiple reression

Figure 4 – Multiple Regression Power dialog box

9 Responses to Statistical Power and Sample Size for Multiple Regression

  1. I created a spreadsheet using the values on this page, and downloaded the package. However, I get an incorrect value for NF-dist. Your sheet shows 0.208282. I get 1.05149E-5. Did I do something wrong? First numbers in each row below are my values, the second number is your example.
    n 100 100
    k 10 10
    dfRes 89 89
    dfReg 10 10

    R-sq 0.4 0.4
    f-sq 0.666666667 0.666666667
    λ 66.66666667 66.66666667

    α 0.05 0.05
    F-crit 1.938791309 1.938791309
    β 1.05149E-05 0.208282
    1-β 0.999989485 0.999989485

    Thanks for what you do.

    • Charles says:

      David,
      If you send me an Excel file with your data, I will try to figure out what is doing on.
      Charles

      • Thanks. I sent you the spreadsheet. I have some more general questions which I include here:

        Is the power calculation influenced by the use of stepwise regression, where there may be many more potential independent variables than are used in the final model?

        This could be critical if you are including interaction terms.
        For example, if there are ten independent variables, the interaction terms could include x1*x2, x1*x3, … x1*x2*x3, … all the way to Productsum(x(i)) i = 1…10. In this case, there are 1024 possible candidate “independent variables,” including the synthetic ones. Yet the final model might have only a few terms.

        On one hand, we don’t want to be guilty of “p-hacking” by creating so many candidate terms. On the other hand, we don’t want to miss relationships that may exist in the data.

        One could include multivariate polynomial terms such as x1*x3^2, x3*x5^-1, etc. Then there may be many more candidate terms. The website I linked to does this kind of calculation.

        Regards, Dave.

        • Charles says:

          Dave,
          The problem is that there are an infinite number of possible terms to include (besides the ones you have mentioned, there are potentially LN(x1), exp(x1), x1^2, x1^3, x1^4, x1^x2, sin(x1), etc.). You need to use some judgement to determine which such terms are reasonable. Often there are some theoretical consideration, but sometimes you need to create a plot if the data to see which terms are likely to matter. Also you might do a little trial and error.
          Charles

  2. anupam ghosh says:

    Sir, This paper is very very helpful. But I am not understanding why – but when i am doing your example in excel (example 1), the last formula for calculating Beta is not coming – means – i am typing the formula NF_DIST, but it is showing ‘ERROR’.
    Please help.

    • Charles says:

      Anupam,
      I don’t know what the cell value of ERROR means. Usually if there is an error, you would see one of the following #DIV/0, #N/A, #NUM!, #VALUE!, #NAME?, #NULL! or #REF!
      What release of the Real Statistics software are you using? You can enter =VER() to find this out.
      Charles

  3. Emily says:

    Thank you for the clear and insightful articles here. I wondered what advice you have for conducting a multiple regression-type analysis but with unavoidably low sample sizes? My dataset appears to meet the other assumptions of regression, but has only 19 observations, with two independent variables I’d like to explore against a continuous dependent (actually multiple dependents but I will run each one separately). One independent variable is categorical, the other continuous. Any advice on the best course of action with small samples would be much appreciated! Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *