In Exponential Regression and Power Regression we reviewed four types of log transformation for regression models with one independent variable. We now briefly examine the multiple regression counterparts to these four types of log transformations:

Level-level regression is the normal multiple regression we have studied in Least Squares for Multiple Regression and Multiple Regression Analysis. Log-level regression is the multivariate counterpart to exponential regression examined in Exponential Regression. Namely, by taking the exponential of each side of the equation shown above we get the equivalent form

Similarly, the log-log regression model is the multivariate counterpart to the power regression model examined in Power Regression. We see this by taking the exponential of both sides of the equation shown above and simplifying to get

Since any positive constant *c* can be expressed as *e*^{ln c}, we can re-express this equation by

(where clearly the coefficients are not the same, and where we included negative values for as well).

We now give an example of where the log-level regression model is a good fit for some data.

**Example 1**: Repeat Example 1 of Least Squares for Multiple Regression using the data on the left side of Figure 1.

**Figure 1 – Log-level transformation**

The right side of the figure shows the log transformation of the price: e.g. cell G6 contains the formula =LN(C6). We next run regression data analysis on the log transformed data. We could use the Excel **Regression** tool, although here we use the Real Statistics **Linear Regression** data analysis tool (as described in Multiple Regression Analysis) on the X input in range E5:F16 and Y input in range G5:G16. The output is shown in Figure 2.

**Figure 2 – Regression on log-level transformed data**

The high value for R-Square shows that the log-level transformed data is a good fit for the linear regression model. Since zero is not in the 95% confidence intervals for Color or Quality, the corresponding coefficients are significantly different from zero.

We could also use the array formula =LOGEST(C6:C16,A6:B16,TRUE,TRUE) to obtain the following output (the labels have been manually added):

**Figure 3 – Use of LOGEST function**

Note that the slope/intercept values in row 7 of Figure 3 are the exponential of the linear coefficients calculated in Figure 2: e.g. the value of cell R7 is equal to EXP(J23) and the value of cell T7 is equal to EXP(J21).

We can also use the regression model to predict the price of a given diamond. For example, suppose a diamond has Color = 4 and Quality = 5 or Color = 7 and Quality = 7, then the following three approaches show how to predict the Price based on the regression model:

**Figure 4 – Forecasting using the log-level model**

**Example 2**: Repeat Example 1 using the data on the left side of Figure 5.

**Figure 5– Log-log transformation**

The right side of the figure shows the log transformation of the color, quality and price. We next run the regression data analysis tool on the log transformed data, i.e. with range E5:F16 as Input X and range G5:G16 as Input Y. The output is shown in Figure 6.

**Figure 6 – Regression on log-log transformed data**

As in the previous example, we see from Figure 6 that the model is a good fit for the data. We can also use the regression model for forecasting. Note that there are LOGEST or GROWTH functions for the log-log transformed models, but we still have the following two approaches for forecasting:

**Figure 7 – Forecasting using the log-log model**

I am trying to build a forecasting model using multiple regression, can you have a look at it and tell me if I am doing it right?.

I would appreciate any help on this.

Regards,

Anil

Anil,

I don’t generally do this sort of thing since it can be very timeconsuming. If you send me your model I will take a quick look, but I won’t be able to try to decipher things.

Charles

If I am doing a multivariate regression, but on my left hand side of the equation some of my independent variables have values in thousands which are much higher compared to the others having absolute values in range of, say, 0 to 100, should I use log for all of them or just for the ones with the high values so they can be put on the same level? Also, if some of my variables are in percentages, is it ok if I still apply the log on to them?

Thank you very much!

Daniel,

You can use log for these, but you also might be better off not doing so. The important thing is not that absolute values be on the same scale, but that the assumptions for multiple regression be satisfied (linearity, normality, homogeneity of variances). If using the log contributes to this then using the log can be a good idea, otherwise it is better not to use the log. You can use log for some variables but not others.

Charles

Daniel,

If I only had one independent variable I could do a scatter plot against the dependent variable to visually determine whether the relationship is linear, and if not, whether a transformation (log, ln, 1/x, etc.) is appropriate. but when I have multiple independent variables (say 3 o 4) in a multiple regression, what’s the best way to test for linearity, and what if some are liner and others curves (e.g. exponential)? Thanks

Adeeb,

I tend to simply perform the multiple regression analysis and see if I have a good fit (based on the value of R-square and the significance of the correlation coefficients). You can compare difference transformations in this way as well.

Charles

Hello SIr, i am implementing a log transfromation on OLS regressioni.e Log transformation on multiple regression. But among the 3 types of log transformations namely log-level,level-log and log-log, which transformation should i go with? Is log-level similar to box cox transformation?

There are many types of transformations in addition to the ones you have referenced. The specific transformation depends on your data. Usually you are picking the transformation that achieves some objective (e.g. making the data more linear or making the data better fit the normal distribution).

The log-linear (i.e. log-level) transformation is one of the transformations in the Box-Cox family of transformations.

Charles

This post helped me work through issues I was having with a log regression. Love your site, your posts and examples are detailed and easy to implement. Thank you!

Hi Sir,

Could I ask for a bit of help? What kind of fitting should I use if I have a log-log plot of two independent variables (x and y have been measured with error)?

Thanks so much

Sorry, but I don’t understand your question.

Charles

Hello Charles,

I am running two OLS models. Model 1 is a liner model and in model 2 i log my outcome variable y.

When I ran the regression using model 1 it shows that my explanatory variable x has a positive and significant effect on y. Then when I ran the model using the log form (i.e., ln(y)) my explanatory variable becomes negative and insignificant.

I am having a hard time understanding why in model 2, my x variable becomes negative and insignificant? Not sure how what is the correct interpretation here. Any suggestions?

Thank you!

It is difficult for me to answer your question without seeing the data. Perhaps the assumptions for an OLS model are not being met with one of these scenarios. If you send me an Excel file with the data, I will try to see why this is occurring.

Charles

Can I know the log-log regressions have more statistical sense or business sense?

Angela,

I don’t really know how to answer this question. They have practical application and are an interesting subject in statistics.

Charles

Hello Charles

If there is a zero value in the independent variable, how can we go ahead with the log transformation in the log-log model?

Thanks

Use log(x+a) instead of log(x) where a is a constant big enough so that x+a is always positive (for the values of x that you are considering).

Charles

I am using 2 stage least square ans seemingly unrelated regression, where I have 12 independent variables. I am planning to use log value for dependent variable and only two independent variables among the 12 independent variables. It is not fully likes log-log regression. Would you please tell me can I do it and if I can, how I can refer the name of this type of model?

I would appreciate any help on this.

Shampa,

I don’t yet support 2 stage least squares, and so I don’t have any advice about this topic at this time.

Charles

Sir, what if even after taking log data is not normal…..then how to make data normal?….I m having hard time please let me know. Can i take log of already log series …is it ok for making data normal.

Ruchi,

See the Box-Cox approach on the following webpage

Box-Cox Transformation

Charles

Thanks for this information.

Wondering about your cell references in Figure 7. Is it possible that the references to cells J56, J57, J58 in Figure 7 should actually refer to the coefficients in cells J51, J52, J53 in Figure 6?

Yes, you are correct. Actually, these formulas refer to the exponential of the values in cells J51, J52, J53 of Figure 6.

Thanks very much for catching this mistake. I have now updated the referenced figure to reflect the change.

Charles

Ok, all makes sense now. Thanks!

Charles, thank you so much for your knowledge sharing!

Got a question, I have gone over this article and tried to come up with an level-log equation like you did with $T$7*$S$7^W14*$R$7^X14 (Fig 4) for log-level and with EXP($J$51)*EXP($J$52)^LN(W38)*EXP($J$53)^LN(X38) (Fig. 7) for log – log.

Any suggestions would be highly appreciated.

Further info: I am using a linest ((price);ln(color;quality)) type of regression.

Found it!

it is y=a+b·ln(x)+c·ln(y)

Once again thanks for your knowledge sharing Charles

is it possible to apply logs to the regresand and not on all the regressors. because other regressors are negative and a log cannot be negative. for example log(exchange_rate)=B+log(oilprices)+interest_rates.

is the model above correct or not.

Chanda,

Yes, you can do this.

Charles

Hello, Charles.

adding to the previous question. In a Log-log regression if you are applying only to 2 independent variables the logarithm, then how you can read the results.

I mean, the coefficient of the variables with logarithm are in percentages and the coefficient of the variable without the Log are in monetary units?

Martha,

Sorry, but what is the previous question?

Charles

for the log transformation of time series data ,in excel which function we have to press

ln or log

LN(x) is the natural log of x and LOG(x,b) is the the log of x base b. Note that LN(x) = LOG(x,EXP(1))

Generally the natural log is used, although you could really use log to any base.

Charles

hello,

do you know how to interpret the figures?

my equation using log is:

log(Y) = c + log(x1) +log(x2) +log(x3)

i ran the regression on Eviews but i do not know how to interprete my coefficients. please help. thank you.

Shahanah,

I presume the regression model is log(Y) = c + b1*log(x1) + b2*log(x2) + b3*log(x3)

Taking the exponential of both sides of the equation yields y = e^c * x1^b1 * x2^b2 * x3^b3.

E.g. if you double x1, then the new y will be 2^b1 time previous value of y.

Charles

hello,

how to determine the coefficients ( c; b1; b2 and b3) for this regression model: log(Y) = c + b1*log(x1) + b2*log(x2) + b3*log(x3).

thanks

Kossi,

See the following webpage:

Power Regression

Charles

Hello,

I am working on a level-log model. But I wonder how to obtain the related prediction equation from the slope and intersection as AA38 or AA39 in Fig. 7? After tests, the equation y=a+b·ln(x)+c·ln(y) proposed by a reader in the comment was not working. Could you show me the prediction equation from the slope and intersection for a level-log model?

Thank you very much,

James

James,

Since your regression takes the form y = b * ln x + a, you can view this as the simple linear regression y = b * z + a where z = ln x. You can use Excel’s TREND to predict the value of y based on any given value of z. Suppose you want to predict y for the x value x0. All you need to do, is use TREND to predict the value of y when z = ln x0; i.e. first take the log of your x0 value and then use TREND to forecast y.

Charles

Charles,

Finally, I calculated y by y=b0 + b1*ln x1 + b2*ln x2 + b3*ln x3 +b4*ln x4 + b5*ln x5. I got a better fitting from the level-log model than the log-log model. Then I applied the prediction equations of these two models to another data for prediction. Somehow I got many negative numbers in prediction in the level-log model that is very different from the log-log model. The maximum in the level-log model is much smaller than the log-log model. The prediction of the level-log model in new data is worse than the log-log model. I wonder why this result is possible?

Thanks much,

James

James, sorry, but I don’t have enough information to be able to speculate as to why this has happened.

Charles

Charles,

I expected a better model obtained from data A would have a better prediction in data B with the same equation. Do you think this is a correct concept? I got a completely different result so I raised the question.

Thanks,

James

James,

I don’t have enough information to determine this.

Charles

Good evening

I have a question for my multiple regression analysis. Does someone can help me with that?

I’m predicting the GDP of a country using different factors. I’m planning to use the following model (hightes r square).

log(y) = β0 + β1 * log(Xi1) + β2 * Xi2 + … + βn * Xn + ε; in other words: is it possible to do a log – log transformation without transforming all the independent variables?

Jeroen

Jeroen,

Yes, you can perform such a transformation on some variables, but not others.

Charles

Hi,

My data contains one dependent variable and 10 independent variables (n=720) from an experimentally designed plot. My data are all positive. The value of dependent variable ranges between 0.17-o.89. Values of independent variables varies:

1) Continuous X1 (20-70%), X2 (0.11-1.4), X3 (3-30%), X4 (9-18), X5 (6-60%), X6 (1-5), X7(5-46), 2) Categorical X8 (4 levels), X9 (3 levels), X10 (5 levels)

I tried model with some variables and found deviation from normality assumption of multiple regression.

Question 1:

What type of transformation is good for my dataset as my indepdent variable contains both continuous and categorical data. It means, for what variables do I need to use log transformation as there are different range and scale of data values?

I tried to fit the model with all independent variables and found 6 of them are not significant. Through model selection, eliminated non significcant variables and finally, got model with all 4 significant variables (two continuous and two categorical). I diagnosed that final model, and found it was not satisfied with linearity, normality, homogeneity of variances. Now I would like to make an appropriate transformation for my model.

Question 2:

When shall I perform transformation, at the begining (before elemination of non significant variables) for the model that contains all the variables or at the end (after elimination of non significant variables) for the model that contains only significant variables?

I would appreciate any suggestions on these two questions. Thank you !

Regards’

Mkc

Mkc,

1. Which transformation to use depends on a number of factors. This is more art than science. See the following webpages for some suggestions:

http://www.real-statistics.com/descriptive-statistics/data-transformations/

http://www.real-statistics.com/correlation/box-cox-transformation/

Also depending on what you are going to do with your data, you may or may not need to satisfy some of the stated assumptions: linearity, normality, homogeneity of variances.

2. You can do this before or after the transformation, but what you shouldn’t do is mix the results. You should one or the other. Based on your comments, you probably want to eliminate variables after making the transformations.

Charles

Hi, I would like to know if it is possible to use Log10 transformation in the independent variable (in my case ocean depth from 0 to 2000) to explain growth rates (% body weight/day going from 0.1 to 6). I did the regression using the values (G= b-a*Depth) and I had a weak relationship R²=0.18. But when I do the regression (G= b-a*Log10[depth]) my R²=0,67. Is it acceptable, how can I explain why I did the transformation?

Thank you very much for this blog!!

Richard,

It is common to make such a transformation. I would graph the transformed data values (as a scatter plot) to make sure that the points more or less line up on a line.

Charles

Hi Charles,

I made Lambda scaled power transformation of my dependent variable (y) and fitted model in different functional form with the independent variable (x), and found best fitted (p0. In my case, some values of y are already in negative by the lambda scaled power transformation, so it provided the model summary accounting only positive values of y. Therefore, what could be the solution to appropriately find r-squared value in my case? or how can I make this model into a linear function with accounting all values of the dataset?

Note: both y and x are continuous variables, values range from 0.1-0.9 (gram) and 5-25 (degrees C) respectively. The reason I applied initial Lambda scaled power transformation is as it best satisfied with all the normality and homogeneity residual variance assumptions than with original and other transformations.

Thanks !

I don’t understand transformations you made. If you send me an Excel file with your data and the transformations you made I will try to answer your question.

Charles

Dear Charles,

I ran a multiple regression with dependent variable as Electricity Sale (Y) and Independent Variables as GDP(G), Electricity Price(P) and the Lag of the Electricity Sales (L) with Log transformation on both sides. As my equation is

Ln(Y) = C+ aLn(G)+bLn(P)+CLn(L).

Now after finding the coefficients, a, b, c, I’m given an equation for forecasting in the following form

Yt=Yt-1*((1+Growth Rate G)^a)( 1+Growth Rate P)^b)(1+Growth Rate L)^c))

Where:

Yt=Electricity Sale of current year

Yt-1=Electricity Sale of previous year

Growth Rate is give in percentages like 0.05

I don’t know how this equation is derived? and why is that constant term C omitted?

Abrar,

The equation Ln(Y) = C+aLn(G)+bLn(P)+cLn(L) is equivalent to Y = e^C * G^a * P^b * L^c.

I don’t know how to derive the other equation.

Charles

Good afternoon Professor,

I’m building a model explaining how different factors affect GDP like this: log GDP = u+ B1*log X1+B2*X2+B3*X3+…+Bn*Xn

Is there any difference in the robustness test for this model compared to that for other linear models?

Thank you.

Thu Tra,

Which robust test are you referring to?

Charles

Hi Charles,

Thank you very much for the prompt response.

By surfing here and there on the internet, I have made derivation for my forecast equations as follows. Kindly have a look at it and let me know if it makes sense.

“In our regression model, both the dependent and independent variables are log transformed and our regression equation is of the following form

Ln (Y) = C + b*Ln(G)+c*Ln(P)+d*Ln(L) (3.10.1)

Where:

Y= Electricity Sale

C= Constant

G= GDP

P= Average Electricity Price per kWh

L= Lag of the Electricity Sale (Y)

b,c,d= Elasticities of GDP, Price and Lag respectively

“In order to derive a general equation for forecasting, we will analyze the impact of one predictor (independent variable) on the response (independent variable) at a time keeping the other predictors at constant value.

Constant (C):

The constant also known as the Y intercept is the value at which the fitted line crosses the Y axis. Mathematically, it is described as the mean response value when all the predictors are set to zero. However, a zero setting for all the predictors is often an impossible/nonsensical combination.

In our regression equations having predictors as GDP, Price and Lag of Sale, the intercept values are not economically meaningful. However, we are not particularly interested in what would happen if all the independent variables were simultaneously zero, therefore, we have left the constant in the model regardless of its statistical significance.

In order to simplify the forecast equation, we have ignored the constant as its magnitude is very small (less than 1).

Impact of Predictors (GDP, Price & Lag of Sale):

In order to see the impact of GDP (G) on the electricity sale (Y), we take two values of G (G1 and G2) and held the other predictors at fixed value, the above equation 3.10.1 yields

Ln (Y2)-Ln(Y1)=b*(Ln(G2)-Ln(G1)) 3.10.2

By simplifying

Ln (Y2/Y1)=b* (Ln(G2/G1)) 3.10.3

By taking inverse transform

Y2/Y1= (G2/G1)^b 3.10.4

Now growth rate is defined as:

Growth Rate = (Final Value-Initial Value)/Initial Value

So we can define the growth rate of GDP as

GR of G = (G2-G1)/G1

Or

G2/G1=1+GR of G 3.10.5

Now putting the value of G2/G1 from equation 3.10.5 in equation 3.10.4 and replacing the Y2 with Yt and Y1 with Yt-1, Equation 3.10.4 can be written as:

Yt/Yt-1=(1+GR of G)^b

OR

Yt=Yt-1*(1+GR of G)^b 3.10.6

Similarly the impact of Price (P) and Lag of Sales(L) can be derived as

Yt=Yt-1*(1+GR of P)^c 3.10.7

Yt=Yt-1*(1+GR of L)^d 3.10.8

As all our predictors are independent of each other, so we can combine the impact of all the three variables in a single equation.

Yt= Yt-1*(1+GR of G)^b + Yt-1*(1+GR of P)^c + Yt-1*(1+GR of L)^d

OR

Yt=Yt-1* ((1+GR of G)^b)* (1+GR of P)^c)* (1+GR of L)^d ) 3.10.9

The above equation is known as the general forecast equation.”

Hi Abrar,

I have to trust your research into this formula. I am sorry to say that I don’t have the time to investigate it further. Perhaps someone else in the community can comment.

Charles

Hello..help me solve this.

You have been provided with the following information in table form

Input (L) 1 2 3 4 5

Output(Q) 0.58 1.1 1.2 1.3 1.95

Fit a cob Douglas function of the form Q=aL^be^u to the data and solve the variance of the regression model.

Have you tried to perform a log transformation?

Charles

Yes.

This is what I found

The cobb Douglas dissociates to a linear equation as follows:

Lin Q=lin a+b lin L+u

The default regression model is :Y=a+bX+e

I get confused when i now have to replace the data given to the linear form of the Cobb Douglas function.

How do i go about??Will the Lin of Q represent the dependent variable Y,and Lin of L represent X??

Usually,we know that output (Q for our case) is a dependent variable,and input is a an independent variable.

Thats where the confusion kicks in.Help me determine whether lin Q will represent X or Y from the data given.

What you are calling Lin Q is usually referred to to as LN(Q), namely the natural log of Q (and similarly for the other variables). This means that you replace the data for the dependent and independent variables by the natural log of these data values. You then perform ordinary multiple linear regression to find the coefficients. The dependent variable Y is LN(Q) and LN(L) is the independent variable X (these can also be multiple variables).

Charles