Regression Forecasts with Seasonality

We now turn our attention to the situation where we use regression with seasonal data: hourly, weekly, monthly, quarterly, etc. For hours we have 24 periods in a day, for months we have 12 periods in a year, etc. In particular, we are concerned with cases where the seasons influence the trend of the data (e.g. annual sales revenues are increasing, but revenues in June are lower than in September).

Example

Example 1: Company XYZ’s quarterly revenues for 2012 through 2015 are shown in column C of Figure 1. We would like to forecast the quarterly revenues for 2016 based on a linear regression model.

As we see from the blue curve in Figure 2, although the annual trend of the revenues may be linear, the graph is certainly not linear due to seasonal fluctuations. We need a way to handle these seasonal fluctuations.

Seasonal forecasting regression Excel

Figure 1 – Seasonal forecasting

Seasonal Approach

The approach we use is to add categorical variables to represent the four seasons (Q1, Q2, Q3, Q4). Three dummy variables are required (one fewer than the number of periods). The coding based on these variables is shown in columns E, F, and G of Figure 1. Column E contains a 1 for revenue data in Q1 and a 0 for revenue data not in Q1. Similarly, column F contains a 1 for data in Q2 and a 0 for data not in Q2. Column G codes for data in Q3. Revenue data in Q4 will have a 0 in columns E, F, and G.

We also include a variable t in column D which simply lists the time periods sequentially ignoring the quarter.

Seasonal trends regression

Figure 2 – Seasonal Trends

We now construct a multiple linear regression model using the data in range D3:G19 as our X values and range C3:C19 as our Y values. This analysis is shown in Figure 3.

Regression analysis seasonality

Figure 3 – Regression Analysis with Seasonality

We can use this model to create predictions for the historical data in 2012-2015 as well as for 2016 (future forecast). These predictions are shown in column H of Figure 1 using the array formula

=TREND(C4:C19,D4:G19,D4:G23)

This is the red curve in Figure 2. E.g. the prediction for Q1 of 2012 is $10,812,500 (cell H4), which is fairly close to the actual revenue of $10,500,000 (cell C4).

The forecasted values for each quarter in 2016 are shown in range H20:H23 of Figure 1.

70 thoughts on “Regression Forecasts with Seasonality”

    • Hello Roman,
      You can set the intercept to zero by using Regression through the Origin (aka regression with zero intercept), but is there a reason why you want to do this?
      Charles

      Reply
      • Hello Charles,
        Thank you for the reply. I simply saw such a way on this webpage (https://asmquantmacro.com/2015/06/29/testing-for-seasonality-in-excel/). I guess the author did so in order to select all months (rather than 11 months). If we estimate the model using for instance feb to dec with intercept then the coefficient estimates how much higher or lower returns are in the other months OVER the jan returns. But I guess the goal of the author was to see the how coefficients explain each month’s effect on returns not comparing to jan. So i just wanted to ask if this is also a valid way of capturing seasonality.

        Reply
        • Hello Roman,
          The regression analysis on this site uses all 12 months but omits an intercept term. This is equivalent to 11 months plus an intercept term.
          Charles

          Reply
  1. Thanks Charles,

    For demand forecasting for a certain products with 5 years historical data 2017-2021 ( the data is recorded by the order from the customer, not daily or monthly)

    The question, is it better to do the forecast quarterly or monthly? I am using Holt-Winter and Regression to compare with them after I finished

    Important note: demand on Jan 2019 is missing

    Reply
  2. Hi can you help me understand how to run regression in this scenario.
    In case of a retail chain with seasonality trend in data that can be handled the same way but supposingly there are other factors contributing to the trend like number of stores in each quarter, discounts offered.

    So how to handle the data with seasonality as well as other factors. Please help

    Reply
  3. Is it possible to apply the Prais-Wintsen method to control autocorrelation in situations with seasonality?
    Also, how do I match the equation in your example to this generic equation: Yi = b0 + b1 * Xi + b2 * sen (2Ï€Xi / L) + b3 * cos (2Ï€Xi / L)
    Where,
    Xi is the temporal trend factor
    b1 is the slope of the trend,
    b2 and b3 the seasonality coefficients,
    L is the periodicity constant (for example, months).
    This equation would be for a seasonality for a harmonic, as your example.
    Thanks a lot

    Reply
    • Real Statistics doesn’t yet support the Prais-Winsten method.
      You could treat the equation as the linear equation Yi = b0 + b1 * Xi + b2 * Ui + b3 * Vi where Ui = sin(2Ï€Xi/L) and Vi = cos(2Ï€Xi/L), but probably you would want to use nonlinear regression.
      Charles

      Reply
  4. Hi. How would you interpret the coefficients of the Q1, Q2, Q3 in the output table. e.g. for every increase in Q1, revenue decrease by 3.98?

    Reply
    • Shane,
      Suppose the dummy variables are q1, q2, q3 for quarters Q1, Q2, Q3m with Q4 acting as the base. For Q1 you need to consider the case where q1 = 1, q2 = 0 and q3 = 0. In this case, the coefficient for the q1 variable is simply added to the intercept coefficient. For Q2 you need to consider the case where q1 = 0, q2 = 1 and q3 = 0. In this case, the coefficient for the q2 variable is simply added to the intercept coefficient. The situation is similar for Q3. For Q4 ou need to consider the case where q1 = 0, q2 = 0 and q3 = 0. In this case, the intercept coefficient captures the impact of Q4. Thus we see that the q1, q2 and q3 coefficients capture the displacement from the intercept, which acts as the Q4 coefficient
      Charles

      Reply
  5. Is an F-test a one-sided hypothesis test because one only rejects the null hypothesis if the calculated statistic is greater than the tabulated critical value?

    Reply
  6. What will be the quadratic time trend and how to set seasonal dummies for monthly?
    Please also guide me how to formulate the equation as well

    Reply
    • Hi Steven,
      Use the same approach as described on the webpage, but now you will need 51 dummy variables, one for each week in the year minus one. If, instead, you mean days in the week, then you need 6 dummy variables, one for each day in the week minus one.
      Charles

      Reply
      • Hi Charles,

        If looking at weeks, is it possible to have 51 dummy variables as there is a limit of 16 variables in Excel.

        Thanks in advance!

        Reply
          • Hi Charles,

            Thanks for this, I have 51 (n-1) weeks nd 3 other variables. I’m getting an error “Input x must have at least two more rows of data than columns.” What can be causing this? When I only include 18 weeks I dont get the same error.

          • Kathrine,
            For any linear regression analysis if you have k independent variables you need to have at least k+2 rows of data. The dummy variables used to model the weeks count towards this value of k.
            Charles

  7. Hi Charles!

    I’ve got a question about your regression model, y = a+ b * X + error, do we have 4 (one for each Q) or 5 (one for each Q + 1 for ‘quaternumber’).

    Thank you very much for this great website!
    Ella

    Reply
    • Ella,
      It depends on what independent variables you put in the model. If say you have 4 seasons, then you need 4-1 = 3 dummy variables to model the seasons. In the example given on the webpage, another variable was used to model the trend (t for time or trend). You might have other independent variables that are not about seasons or time.
      Charles

      Reply
  8. I am using linear and log linear regression to review a large amount of products for which I have weekly sales(split into months for the analysis) and the aim is to assign a suitable seasonal pattern for each item. However, I am unclear which of the statistical output of the regression would guide me on whether there is seasonality or not on my data? Should I use an average of the sales index or forecast index produced by the regression or how could I tell if there is no seasonality and I should assume index=1 across all 52 periods?
    I wanted to clarify that I have used the Linest function in my data so that I can run Macros on the analysis so I have not got the p values(significance of each variable) shown. I am unsure how to get the p value also in excel as part of that analysis. If you could also help on that it would be great. Many thanks

    Reply
    • Hi Dimi,
      1. You can create a plot of your data to determine whether there is seasonality. If there is seasonality, you should be able to spot the pattern (monthly, quarterly, etc.).
      2. To calculate the p-value for each coefficient, you take the slope value (row 1 from LINEST) and divide it by the standard error (row 2 from LINEST). This yields the t statistic. You can now calculate the p-value for that coefficient using =T.DIST.2T(t,n-2) where n = the sample size.
      Charles

      Reply
      • Great many thanks for the formula. That will help a lot. Would though a high p(low significance) show that there is no consistent impact for March let’s say if the p for the dummy March variant would be more thank 0.05?Or else what does a high p for that variable (one of the 11 months) would mean I should do different with this month?
        Finally would there be an easy way to exclude the impact of the trend from my data to get to an accurate index?

        Many thanks again
        Dimitra

        Reply
  9. To capture seasonality, we use dummy variables. What is the rational behind using one less number of variables than the number of periods. For example, for a quarterly data, we add three variables, not four, and for monthly data, we add 11, not 12 variables.

    Reply
  10. QUESTION FIVE

    Sales of article B (’000 units)
    QI Q2 Q3 Q4 (Q = quarter)
    2015 24.8 36.3 38.1 47.5
    2016 31.2 42.3 43.4 55.9
    2017 40.0 48.8 54.0 69.1
    2018 54.7 57.8 60.3 68.9

    (a) Look at the data. What sort of trend and seasonal pattern do you expect to emerge from the analysis of this data? (2 marks)
    (b) Derive a regression equation from the data and forecast the trend in sales for the four quarters of 2019 (2 marks)
    (c) Discuss the usefulness of this method of forecasting.
    (1 mark)

    Reply
  11. How to get the values in M6:N8; that is AIC/AICc and SBC. I dont get these as output while using excel built in function inside Data Analysis–Regression.

    Reply
  12. Hi Please help me to answer this question
    Given the following table, forecast for the production of shirt using REGRESSION method for 2019, 2020 and 2021.Show your solution.
    YEAR PRODUCTION SALES
    2012 343.5 PIECES P12,789 M
    2013 381.2 PIECES P4567 M
    2014 437.5 PIECES P 12,345 M
    2015 323.6 PIECES P 18965 M
    2016 231.4 PIECES P76239 M
    2017 443.5 PIECES P34897M
    2018 525.2 PIECES P23419 M

    Reply
    • There are a few approaches, but the simplest is to use the method described on this webpage. If there is no trend then this will be apparent from the regression coefficients.
      Charles

      Reply
  13. My only question is should I use “t” in the regression equation? To me T represents Quarter 4. However, I have heard Quarter 4 is represented in the Y intercept? I am confused.

    Should I use the x variable t in the regression equation or leave it out?

    Thank you.

    Reply

Leave a Comment