Weighted Linear Regression

Weighted least squares (WLS) regression is an extension of ordinary (OLS) least-squares regression by the use of weights. Generally, weighted least squares regression is used when the homogeneous variance assumption of OLS regression is not met (aka heteroscedasticity or heteroskedasticity).

Topics

Reference

Wikipedia (2015) Weighted least squares
https://en.wikipedia.org/wiki/Weighted_least_squares

39 thoughts on “Weighted Linear Regression”

  1. Hey Charles,

    Thanks for all the hard work you’ve put in! Is it possible to weight by the independent variables as opposed to each row of data points? For example, if I wanted to assign weight of .2, .3, and .5, to variables x1, x2, and x3, respectively, would I be able to do so using your add-in? Regardless, thanks a bunch for your continuous effort and kindness!

    Kind regards,
    Alvin

    Reply
    • Hi Alvin,
      Thanks for your support and kind words.
      I would think that in this case you would multiple all your x1 data by .2, all your x2 data by .3 and all your x3 data by .5.
      Charles

      Reply
  2. Hello Charles,

    I just wanted to let you know that your tool has been a lifesaver! I’m in college right now taking an advanced lab class that requires me to analyze data using many of the functions provided by your RealStats tool. I really didn’t want to have to pick up python again to have to code in my own weighted least squares/linear regression, but your add-in is so easy to use and does it all perfectly.

    Thanks!
    Paige

    Reply
  3. Thank you for this website and the add-in module. I find your explanations and the formulas very useful.

    I am trying to use the Weighted Linear Regression, but the WRegCoeff formula seems to have a limit of 65536 observations. Is that correct? Are you looking to increase the number of rows we can use in a weighted regression.

    Thanks.

    Reply
    • Hello Harry,
      I am pleased that you are finding the Real Statistics website and add-in to be useful.
      Yes, the current limit is 65,536 observations. Until now, no one has asked to support more than this number. What number of observations do you need?
      Charles

      Reply
  4. Hi Charles,

    A fantastic site, thank you. I was hoping you may ne able to point me in the direction of any journal articles/references that detail the best approach to calculate ‘Weights’

    In particular, the rationale behind the following statement you make

    We could use the reciprocals of the squared residuals from column W as our weights, but we obtain better results by first regressing the absolute values of the residuals on the Ad spend and using the predicted values instead of the values in column W to calculate the weights.

    Many thanks,
    Nick

    Reply
    • Hi Nick,
      Thanks for your very kind remarks.
      Just google heteroskedasticity and weighted linear regression. This is a lot of information about this topic online.
      Charles

      Reply
  5. Hi, Charles,
    I tried to follow the example shown in your “Weighted Regression Basics” in Excel. I realised there is an user defined function “Design” in the spreadsheet. For example “=MMULT(MINVERSE(MMULT(TRANSPOSE(DESIGN(A7:A13)),C7:C13*
    DESIGN(A7:A13))),MMULT(TRANSPOSE(DESIGN(A7:A13)),C7:C13*B7:B13))”. Could you please tell me how can I re-produce the example? Should I donwload some files from your Website? Would it be possible to show me the details how to do this? Many thanks, James Wang

    Reply
    • DESIGN(A7:A13) is a Real Statistics formula that outputs a 7 x 2 array whose first column consists of all ones and whose second column consists of the elements in A7:A13. Thus if you, for example, place insert 1 in cell Q7, the formula =A7 in cell R7, highlight the range Q7:R13 and press Crtl-D, the range Q7:R13 can be used in the formula you listed in place of DESIGN(A7:A13).
      In any case, you can access the DESGN function, as well as all the other functions and data analysis tools in the Real Statistics software by going to the following webpage for a free download:
      https://real-statistics.com/free-download/real-statistics-resource-pack/
      Charles

      Reply
  6. Hi Charles,

    Could you clarify what the correct degrees of freedom is for Total, Regression and Residuals for Weighted Linear Regression?
    The green table does not match the output from the Excel add-in.

    I.e. is the Regression dof equal to the number of independent variables (i.e. 1) or equal to the number of independent variables plus one (i.e. 2)?

    Thanks very much

    Reply
    • Luke,
      If your X data has k (independent) variables and n samples, then dfReg = k+1, dfTot = n and dfRes = dfTot – dfReg. Here the k+1 is used for dfReg since there is also data for the weights. See Figure 4.
      Charles

      Reply
      • Thanks Charles, much appreciated.

        This, then, disagrees with the ‘green’ table towards the top of the page, where the dof is written as ‘k’.

        Moreover, I am a little confused with formula for calculating the SS in weighted regression as provided in the add-in. This also does not seem to conform to the green table.

        E.g. if I regression a data sample of y on x without weights, I arrive at different values for the squared sums than when I use the exact same dataset, but use the weighted regression and set all weights to 1.

        Thanks for any help.

        Best regards
        Luke

        Reply
        • Luke,
          There isn’t just one approach to performing weighted regression. One of these approaches uses regression through the origin. I think this is the approach that I chose to use. I have just started looking into this topic again (using a textbook by Weisberg) and will eventually try to clarify these issues. I do believe, however, the following: (1) The value I use for R-Square is not correct and I need to fix it (2) to obtain another version of weighted regression you can simply use dfReg = k and dfTot = n-1; also I believe that SAS uses different values for SSReg and SSTot as well, although SSRes is the same.
          Thanks for bringing this issue to my attention, so that I can delve into it further.
          Charles

          Reply
  7. Hello Charles –

    Thank you for the explanations. I have a question around cell G (std dev) and how to compute it for a simple data set. Suppose we have for a planned regression of Y on X the following points. X=(1,2,3,4,5) and Y=(1.1,2.5,3.4,3.8,7). How would I calculate the standard deviation for each data point (in trying to determine the weights for each data point) in order to transform the model?

    Once I get this I understand how to arrive at the variance and hence the weights.

    Thank you and regards

    Reply
    • Abel,
      If each “group” consists of one data point, then the standard deviation would be undefined (division by zero) and so you couldn’t calculate weights in this way.
      Charles

      Reply
  8. Figure 4 confuses me. I would expect only Ln(mean) but not Weight. Weight is not the intercept, right?

    Is there an option for robust std. errors with weighted linear regression?

    Reply
    • Wytek,
      Weight is one of the independent variables and not an intercept.
      Currently there is no robust errors option for the Weighted Linear Regression data analysis tool. There is a robust errors option for the Multiple Regression data analysis tool.
      Charles

      Reply
      • I am still confused. Figure 3 shows that the independent variables (X) are D4:D12 while the Weights are H4:H12 and are required to be assigned to a separate input box.

        I have imported the excel data into another tool (gretl) and the output indicates that the Weights is indeed the intercept.

        Here are the results:
        coefficient std. error t-ratio p-value
        ——————————————————–
        const −100.846 53.2965 −1.892 0.1073
        Lnmean 126.845 11.8291 10.72 3.88e-05 ***

        Reply
        • Wytek,
          The way that I implemented weighted regression I turned the problem into one with linear regression without intercept. I guess gretl implemented this in a different way. From what I understand, the results are the same.
          Charles

          Reply
          • Charles,
            Without intercept gretl produces this:
            coefficient std. error t-ratio p-value
            ——————————————————–
            Lnmean 105.860 4.81325 21.99 1.01e-07 ***

            You may want to add an intercept checkbox to the dialog box above to see the difference.

            As you have correctly indicated above, the weights are a multiplier matrix W. Replace this W matrix with the identity matrix I and you get plain OLS.

  9. When I run Weighted Linear regression I get R and Rsq values that exceed 9.5. When I run linear regression, I get R values of the order of .3 . I would expect similar R values since when I run weighted correlation coefficients and unweighted correlation coefficients there is a small difference.

    Reply
        • Robert,
          When you press Alt-TI, do you see RealStats and Solver on the list of addins with check marks next to them?
          What do you see when you enter the formula =VER() in any cell?
          Charles

          Reply
          • Charles,
            Yes, RealStats and Solver are present and ticked on the list of addins, and when I enter =VER() in any cell I get the #NAME? after pressing enter.
            Robert

          • Robert,
            Since =VER() is not working, Excel doesn’t recognize the RealStats software. Are you using Excel 2007? If so, check to make sure that you have stored the Real Statistics file in the correct place. See the instructions on the webpage from which you downloaded the Real Statistics file.
            Charles

          • I thought it would be something like that, but I’m slightly confused. As both RealStats and Solver are on the Active Application Add-ins list. Also, the location I’ve used is the one stated on the webpage.

          • Robert,
            This is strange. If you have tried all the troubleshooting steps, then all I can suggest is that you delete the RealStats file and start all over. Close Excel. Open Excel and remove RealStats from the list of addins (to do this press Alt-TI and uncheck RealStats. If you get a message to delete RealStats from the list of addins, agree. If not recheck RealStats and hopefully you will get this message.
            Once you have completed these steps download the software from the website and repeat the installation

Leave a Comment