Alternative approach to multiple regression analysis

Another way of determining whether a regression model is a good fit is to look at whether the population multiple correlation coefficient R between y and ŷ is zero (the null hypothesis). As noted in Multiple Correlation a sample’s adjusted R is an unbiased estimate of the population R. Instead of R, we generally test R2, and use the following property, which is an extension of Theorem 1 of One Sample Testing of Correlation.

Property 1: If the population R = 0, then


Observation: If k = 1, then


is equivalent to Theorem 1 of One Sample Testing of Correlation (by Property 1 of F Distribution).

Example 1: Show that the regression model in Example 2 of Multiple Regression Analysis is a good fit by using Property 1.

We test the null hypothesis H0: R = 0 (see Figure 1).

Regression F-test Excel

Figure 1 – F-test of data in Example 1 using Property 1

As we can see from the above analysis, we reject the null hypothesis, and conclude that the fit of the regression model with the data is not due simply to chance.

6 Responses to Alternative approach to multiple regression analysis

  1. randolf says:

    what does it mean, the df,ss,ms,f,and significance f?

  2. Your articles are wonderful.
    Is there any test – to test the coefficients b1,b2,…bk in the multiple regression model for fit their population coefficients

    • Charles says:

      The usual test is t = (bi-βi)/se ~ T(n-k-1) where βi is the known or hypothesized population coefficient, n = the sample size, k = # of parameters and se is the standard error for bi.

  3. himateja says:

    we have data set regarding the cpu,s and attributes are like clock speed ,memory,cache and performance and we like to predict how performance depend on the memory,clock speed,cache attributes using linear regression or non linear regression and estimate the performances if other attributes are given
    can u help which method will be better and how to approach it

    • Charles says:

      This really depends on a number of factors, most importantly (1) any theoretical or domain-related reasons for preferring one over the other and (2) the nature of the data. Without more information, I am unable to comment further.

Leave a Reply

Your email address will not be published. Required fields are marked *