Regression Forecasts with Seasonality

We now turn our attention to the situation where we use regression with seasonal data: hourly, weekly, monthly, quarterly, etc. For hours we have 24 periods in a day, for months we have 12 periods in a year, etc. In particular, we are concerned with cases where the seasons influence the trend of the data (e.g. annual sales revenues are increasing, but revenues in June are lower than in September).

Example

Example 1: Company XYZ’s quarterly revenues for 2012 through 2015 are shown in column C of Figure 1. We would like to forecast the quarterly revenues for 2016 based on a linear regression model.

As we see from the blue curve in Figure 2, although the annual trend of the revenues may be linear, the graph is certainly not linear due to seasonal fluctuations. We need a way to handle these seasonal fluctuations.

Figure 1 – Seasonal forecasting

Seasonal Approach

The approach we use is to add categorical variables to represent the four seasons (Q1, Q2, Q3, Q4). Three dummy variables are required (one fewer than the number of periods). The coding based on these variables is shown in columns E, F, and G of Figure 1. Column E contains a 1 for revenue data in Q1 and a 0 for revenue data not in Q1. Similarly, column F contains a 1 for data in Q2 and a 0 for data not in Q2. Column G codes for data in Q3. Revenue data in Q4 will have a 0 in columns E, F, and G.

We also include a variable t in column D which simply lists the time periods sequentially ignoring the quarter.

Figure 2 – Seasonal Trends

We now construct a multiple linear regression model using the data in range D3:G19 as our X values and range C3:C19 as our Y values. This analysis is shown in Figure 3.

Figure 3 – Regression Analysis with Seasonality

We can use this model to create predictions for the historical data in 2012-2015 as well as for 2016 (future forecast). These predictions are shown in column H of Figure 1 using the array formula

=TREND(C4:C19,D4:G19,D4:G23)

This is the red curve in Figure 2. E.g. the prediction for Q1 of 2012 is $10,812,500 (cell H4), which is fairly close to the actual revenue of $10,500,000 (cell C4).

The forecasted values for each quarter in 2016 are shown in range H20:H23 of Figure 1.

70 thoughts on “Regression Forecasts with Seasonality”

Roman

May 21, 2023 at 10:51 pm

What if we make intercept = 0. Would that be wrong?
Reply
- Charles
  
  May 22, 2023 at 9:01 pm
  
  Hello Roman,
  You can set the intercept to zero by using Regression through the Origin (aka regression with zero intercept), but is there a reason why you want to do this?
  Charles
  Reply
  - Roman
    
    May 28, 2023 at 12:09 pm
    
    Hello Charles,
    Thank you for the reply. I simply saw such a way on this webpage (https://asmquantmacro.com/2015/06/29/testing-for-seasonality-in-excel/). I guess the author did so in order to select all months (rather than 11 months). If we estimate the model using for instance feb to dec with intercept then the coefficient estimates how much higher or lower returns are in the other months OVER the jan returns. But I guess the goal of the author was to see the how coefficients explain each month’s effect on returns not comparing to jan. So i just wanted to ask if this is also a valid way of capturing seasonality.
    Reply
    - Charles
      
      May 29, 2023 at 4:04 pm
      
      Hello Roman,
      The regression analysis on this site uses all 12 months but omits an intercept term. This is equivalent to 11 months plus an intercept term.
      Charles
      Reply
Mer

September 30, 2022 at 5:01 pm

Hi all,

How can I make a forecast that is not only based on seasonality but also macro variables?

Thank you!
Reply
- Charles
  
  October 1, 2022 at 10:46 am
  
  Hello,
  There are a number of approaches. You can use the approach described on this webpage and just include the macro variables. This is ok provided these variables also vary by season. Another approach is to use Panel Analysis. See
  https://www.real-statistics.com/panel-data-models/
  Charles
  Reply
Mo.

April 24, 2022 at 4:29 pm

Thanks Charles,

For demand forecasting for a certain products with 5 years historical data 2017-2021 ( the data is recorded by the order from the customer, not daily or monthly)

The question, is it better to do the forecast quarterly or monthly? I am using Holt-Winter and Regression to compare with them after I finished

Important note: demand on Jan 2019 is missing
Reply
- Charles
  
  April 24, 2022 at 8:09 pm
  
  Hi Mo,
  Presumably, you have a lot of historical order data you have. This data could be combined to provide daily, weekly, monthly, quarterly, etc. data (assuming that you have enough data). Whether to forecast quarterly or monthly depends on what your needs are.
  Charles
  Reply
- Charles
  
  April 24, 2022 at 8:16 pm
  
  Regarding the missing month of data, you can use interpolation as described at
  https://www.real-statistics.com/time-series-analysis/stochastic-processes/handling-missing-time-series-data/
  Charles
  Reply
Matteo

February 5, 2022 at 11:19 pm

If there is missing data, for example 10% of missing (non-continuous) daily data out of 365 annual data, how can this be done?
Reply
- Charles
  
  February 6, 2022 at 2:04 pm
  
  Hello Matteo,
  It depends on the details, but see the following describing various techniques for handling missing data.
  https://www.real-statistics.com/handling-missing-data/
  Charles
  Reply
Preksha

November 17, 2021 at 3:25 pm

Hi can you help me understand how to run regression in this scenario.
In case of a retail chain with seasonality trend in data that can be handled the same way but supposingly there are other factors contributing to the trend like number of stores in each quarter, discounts offered.

So how to handle the data with seasonality as well as other factors. Please help
Reply
- Charles
  
  November 17, 2021 at 6:11 pm
  
  Just add the data for the other variables (one added column for each variable) as you would for any other regression model.
  Charles
  Reply
doug

August 13, 2021 at 8:09 pm

Why must you account for seasonality when using multiple linear regression to forecast revenue for a particular quarter?
Reply
- Charles
  
  August 14, 2021 at 12:52 pm
  
  Doug,
  You don’t have to account for seasonality, but the resulting forecast will be more accurate if you do (when there is seasonality). Remember that linear regression is based on linearity and seasonality introduces a non-linear component.
  harles
  Reply
  - Mo.
    
    April 24, 2022 at 4:17 pm
    
    Thanks Charles,
    
    For demand forecasting with 5 years historical data ( the data is recorded by the order even if more than one order in a day)
    
    So, is it better to do the forecast quarterly or monthly?
    Reply
    - Charles
      
      April 24, 2022 at 8:10 pm
      
      See my previous email response.
      Charles
      Reply
  - Mo.
    
    April 24, 2022 at 4:19 pm
    
    Thanks Charles,
    
    For demand forecasting with 5 years historical data ( the data is recorded by the order, not daily or monthly)
    Reply
    - Charles
      
      April 24, 2022 at 8:10 pm
      
      See my previous email response.
      Charles
      Reply
JOHNNATAS MIKAEL LOPES

May 6, 2020 at 9:45 pm

Is it possible to apply the Prais-Wintsen method to control autocorrelation in situations with seasonality?
Also, how do I match the equation in your example to this generic equation: Yi = b0 + b1 * Xi + b2 * sen (2πXi / L) + b3 * cos (2πXi / L)
Where,
Xi is the temporal trend factor
b1 is the slope of the trend,
b2 and b3 the seasonality coefficients,
L is the periodicity constant (for example, months).
This equation would be for a seasonality for a harmonic, as your example.
Thanks a lot
Reply
- Charles
  
  May 19, 2020 at 7:30 pm
  
  Real Statistics doesn’t yet support the Prais-Winsten method.
  You could treat the equation as the linear equation Yi = b0 + b1 * Xi + b2 * Ui + b3 * Vi where Ui = sin(2πXi/L) and Vi = cos(2πXi/L), but probably you would want to use nonlinear regression.
  Charles
  Reply
Shane Pow

April 29, 2020 at 11:43 am

Hi. How would you interpret the coefficients of the Q1, Q2, Q3 in the output table. e.g. for every increase in Q1, revenue decrease by 3.98?
Reply
- Charles
  
  May 7, 2020 at 6:46 pm
  
  Shane,
  Suppose the dummy variables are q1, q2, q3 for quarters Q1, Q2, Q3m with Q4 acting as the base. For Q1 you need to consider the case where q1 = 1, q2 = 0 and q3 = 0. In this case, the coefficient for the q1 variable is simply added to the intercept coefficient. For Q2 you need to consider the case where q1 = 0, q2 = 1 and q3 = 0. In this case, the coefficient for the q2 variable is simply added to the intercept coefficient. The situation is similar for Q3. For Q4 ou need to consider the case where q1 = 0, q2 = 0 and q3 = 0. In this case, the intercept coefficient captures the impact of Q4. Thus we see that the q1, q2 and q3 coefficients capture the displacement from the intercept, which acts as the Q4 coefficient
  Charles
  Reply
Roopam

April 11, 2020 at 2:57 pm

Is an F-test a one-sided hypothesis test because one only rejects the null hypothesis if the calculated statistic is greater than the tabulated critical value?
Reply
- Charles
  
  April 11, 2020 at 6:33 pm
  
  Yes
  Reply
  - Roopam
    
    April 11, 2020 at 7:33 pm
    
    Thanks Charles,
    Can you please provide a brief justification for the same. It’d be really helpful.
    Reply
    - Charles
      
      April 12, 2020 at 9:06 pm
      
      Not sure what you want me to justify, but see https://real-statistics.com/multiple-regression/multiple-regression-analysis/
      Charles
      Reply
Priya

April 9, 2020 at 7:29 pm

What will be the quadratic time trend and how to set seasonal dummies for monthly?
Please also guide me how to formulate the equation as well
Reply
- Charles
  
  April 10, 2020 at 9:35 am
  
  Hello Priya,
  Instead of using variables Q1, Q2, Q3 use M1, M2, …, M11 for the months. The coding is as explained at
  Categorical Coding for Regression
  You can also add the term t^2 to the regression model. Its values are the squares of the t values.
  Charles
  Reply
  - Priya
    
    April 10, 2020 at 6:23 pm
    
    If I am setting up montly dummies, then which month should be excluded, January or December?
    Will there be variance in the regression output?
    Reply
    - Charles
      
      April 10, 2020 at 7:56 pm
      
      It doesn’t matter which month you exclude. Some aspects of the regression model will be the same and some different.
      Charles
      Reply
      - Priya
        
        April 12, 2020 at 7:18 am
        
        Thank you so much
Steven

March 10, 2020 at 4:16 pm

What if the data is provided in weeks, how would you transform it?
Reply
- Charles
  
  March 10, 2020 at 6:40 pm
  
  Hi Steven,
  Use the same approach as described on the webpage, but now you will need 51 dummy variables, one for each week in the year minus one. If, instead, you mean days in the week, then you need 6 dummy variables, one for each day in the week minus one.
  Charles
  Reply
  - Kathrine
    
    January 10, 2021 at 10:47 pm
    
    Hi Charles,
    
    If looking at weeks, is it possible to have 51 dummy variables as there is a limit of 16 variables in Excel.
    
    Thanks in advance!
    Reply
    - Charles
      
      January 11, 2021 at 10:06 am
      
      Katherine,
      You can have at least 64 variables using the Real Statistics Regression tool (actually more with some options).
      Charles
      Reply
      - Kathrine
        
        January 12, 2021 at 2:53 am
        
        Hi Charles,
        
        Thanks for this, I have 51 (n-1) weeks nd 3 other variables. I’m getting an error “Input x must have at least two more rows of data than columns.” What can be causing this? When I only include 18 weeks I dont get the same error.
      - Charles
        
        January 12, 2021 at 11:24 am
        
        Kathrine,
        For any linear regression analysis if you have k independent variables you need to have at least k+2 rows of data. The dummy variables used to model the weeks count towards this value of k.
        Charles
Ella Espling

March 8, 2020 at 10:02 am

Hi Charles!

I’ve got a question about your regression model, y = a+ b * X + error, do we have 4 (one for each Q) or 5 (one for each Q + 1 for ‘quaternumber’).

Thank you very much for this great website!
Ella
Reply
- Charles
  
  March 9, 2020 at 8:12 am
  
  Ella,
  It depends on what independent variables you put in the model. If say you have 4 seasons, then you need 4-1 = 3 dummy variables to model the seasons. In the example given on the webpage, another variable was used to model the trend (t for time or trend). You might have other independent variables that are not about seasons or time.
  Charles
  Reply
Hanna Torany

February 28, 2020 at 1:19 pm

Hi,

How can we calculate the standard deviation of our forecast ?

Thank you,
Reply
- Charles
  
  March 1, 2020 at 11:17 pm
  
  Hello Hanna,
  See https://real-statistics.com/multiple-regression/confidence-and-prediction-intervals/
  Charles
  Reply
Dimi

February 18, 2020 at 11:29 pm

I am using linear and log linear regression to review a large amount of products for which I have weekly sales(split into months for the analysis) and the aim is to assign a suitable seasonal pattern for each item. However, I am unclear which of the statistical output of the regression would guide me on whether there is seasonality or not on my data? Should I use an average of the sales index or forecast index produced by the regression or how could I tell if there is no seasonality and I should assume index=1 across all 52 periods?
I wanted to clarify that I have used the Linest function in my data so that I can run Macros on the analysis so I have not got the p values(significance of each variable) shown. I am unsure how to get the p value also in excel as part of that analysis. If you could also help on that it would be great. Many thanks
Reply
- Charles
  
  February 19, 2020 at 10:03 am
  
  Hi Dimi,
  1. You can create a plot of your data to determine whether there is seasonality. If there is seasonality, you should be able to spot the pattern (monthly, quarterly, etc.).
  2. To calculate the p-value for each coefficient, you take the slope value (row 1 from LINEST) and divide it by the standard error (row 2 from LINEST). This yields the t statistic. You can now calculate the p-value for that coefficient using =T.DIST.2T(t,n-2) where n = the sample size.
  Charles
  Reply
  - Dimi
    
    February 20, 2020 at 9:47 pm
    
    Great many thanks for the formula. That will help a lot. Would though a high p(low significance) show that there is no consistent impact for March let’s say if the p for the dummy March variant would be more thank 0.05?Or else what does a high p for that variable (one of the 11 months) would mean I should do different with this month?
    Finally would there be an easy way to exclude the impact of the trend from my data to get to an accurate index?
    
    Many thanks again
    Dimitra
    Reply
    - Charles
      
      March 14, 2020 at 7:18 pm
      
      Hello Dimi,
      1. The actual interpretation of each of the coefficients is complicated by the fact that there are also non-seasonality variables. If there are no other variables, the situation is similar to that described at
      https://real-statistics.com/multiple-regression/anova-using-regression/
      If you want to determine whether seasonality is significant, then you can compare the regression model without the seasonality dummy variables with the model that includes all the seasonality variables by using the approach described at
      https://real-statistics.com/multiple-regression/testing-significance-extra-variables-regression-model/
      2. Yes, just omit the trend variable from the regression model.
      Charles
      Reply
Chaman L. Jain

January 12, 2020 at 1:20 pm

To capture seasonality, we use dummy variables. What is the rational behind using one less number of variables than the number of periods. For example, for a quarterly data, we add three variables, not four, and for monthly data, we add 11, not 12 variables.
Reply
- Charles
  
  January 12, 2020 at 9:30 pm
  
  Hello Chaman,
  If you use one dummy variable per period, then the values for this variable will be a linear combination of the other variables and the regression will not succeed due to multicollinarity. You can learn more about the coding at
  https://real-statistics.com/multiple-regression/multiple-regression-analysis/categorical-coding-regression/
  Charles
  Reply
  - Shawky
    
    February 29, 2020 at 10:28 pm
    
    Hi, can you please mention a detailed article that describes why we use n-1 variables, I still don’t get it
    Thanks in advance
    Reply
    - Charles
      
      March 1, 2020 at 9:42 am
      
      Hi Shawky,
      If you use n variables, then one of the variables will be a linear combination of the others, in which case you will have multicollinearity and the regression model won’t work. I suggest that you try using n variables and see what happens. Also, see
      Collinearity
      Charles
      Reply
Musa

December 2, 2019 at 7:52 pm

My question is that how to predict future after applying multiple regression technique we didn’t able to predict
Reply
- Charles
  
  December 3, 2019 at 8:53 am
  
  Musa,
  This webpage shows how to predict future events.
  Charles
  Reply
Abdirahman Osman

November 30, 2019 at 9:23 am

QUESTION FIVE

Sales of article B (’000 units)
QI Q2 Q3 Q4 (Q = quarter)
2015 24.8 36.3 38.1 47.5
2016 31.2 42.3 43.4 55.9
2017 40.0 48.8 54.0 69.1
2018 54.7 57.8 60.3 68.9

(a) Look at the data. What sort of trend and seasonal pattern do you expect to emerge from the analysis of this data? (2 marks)
(b) Derive a regression equation from the data and forecast the trend in sales for the four quarters of 2019 (2 marks)
(c) Discuss the usefulness of this method of forecasting.
(1 mark)
Reply
- Charles
  
  December 1, 2019 at 10:01 am
  
  What sort of help do you need?
  Charles
  Reply
Gulshan

August 27, 2019 at 9:24 am

How to get the values in M6:N8; that is AIC/AICc and SBC. I dont get these as output while using excel built in function inside Data Analysis–Regression.
Reply
- Charles
  
  August 28, 2019 at 7:51 am
  
  Hello Gulshan,
  Yes, these values are not available from Excel’s data analysis tool. These values are available from the Real Statistics Regression data analysis tool. You can download the Real Statistics software for free at
  Real Statistics Resource Pack
  Charles
  Reply
lea

June 10, 2019 at 5:34 am

Hi Please help me to answer this question
Given the following table, forecast for the production of shirt using REGRESSION method for 2019, 2020 and 2021.Show your solution.
YEAR PRODUCTION SALES
2012 343.5 PIECES P12,789 M
2013 381.2 PIECES P4567 M
2014 437.5 PIECES P 12,345 M
2015 323.6 PIECES P 18965 M
2016 231.4 PIECES P76239 M
2017 443.5 PIECES P34897M
2018 525.2 PIECES P23419 M
Reply
- Charles
  
  June 10, 2019 at 2:39 pm
  
  Hello Lea,
  What is your question?
  Charles
  Reply
julybug

March 28, 2019 at 5:19 am

How would I build the model if there were seasonality but not trend?
Reply
- Charles
  
  March 28, 2019 at 3:44 pm
  
  There are a few approaches, but the simplest is to use the method described on this webpage. If there is no trend then this will be apparent from the regression coefficients.
  Charles
  Reply
- Dr. Herbert Barber
  
  May 21, 2019 at 12:04 am
  
  Julybug, time series forecasting includes seasonality, trend, and noise. Thus, you your data must reflect some form of trend component to use uni-variate time series analysis.
  Reply
Shah

November 19, 2018 at 3:32 am

My only question is should I use “t” in the regression equation? To me T represents Quarter 4. However, I have heard Quarter 4 is represented in the Y intercept? I am confused.

Should I use the x variable t in the regression equation or leave it out?

Thank you.
Reply
- Charles
  
  November 19, 2018 at 11:53 am
  
  Shah,
  You need to include the t variable. You can call it whatever you want, but it isn’t Quarter 4.
  Charles
  Reply
TPM

July 27, 2018 at 2:41 am

What’s the advantage of including “t” as a variable in the model?
Reply
- Charles
  
  July 27, 2018 at 9:41 am
  
  In addition to the seasonality, there is an upward trend to the data (see the chart). This is captured by t.
  Charles
  Reply
- GLJ
  
  August 29, 2018 at 11:24 am
  
  It’s important to account for a possible time trend.
  Since you’re checking for seasonality “t” needs to be included
  Reply
LH

July 1, 2018 at 7:47 am

The main thing the forecast formula is not explained. Doesnt make sense. Can you write the formulaS for forecast cells?
Reply
- Charles
  
  July 1, 2018 at 3:51 pm
  
  LH,
  This webpage builds on concepts described in other webpages. The main concept is to use dummy coding to capture the quarters. This is explained at:
  https://real-statistics.com/multiple-regression/multiple-regression-analysis/categorical-coding-regression/
  The other concept is regression forecasting using the TREND function. This is explained at:
  https://real-statistics.com/multiple-regression/multiple-regression-analysis/multiple-regression-analysis-excel/
  Charles
  Reply
Steven

October 13, 2017 at 5:40 am

What is the prediction formula for the second period, etc?
Thank you
Reply
- Charles
  
  October 13, 2017 at 9:23 pm
  
  Steven,
  You can use the TREND function as described on the website. E.g. the formula in cell H21 is =TREND(C4:C19,D4:G19,D21:G21)
  Charles
  Reply

Example

Seasonal Approach

70 thoughts on “Regression Forecasts with Seasonality”

Leave a Comment Cancel reply