When a sample is not distributed normally, and is not even symmetric, then sometimes it can be useful to transform the data so that the transformed data is more normal or at least roughly symmetric. We touch upon the subject in Transformations, and will explore this concept a bit further in this section.

When data is skewed to the left, transformations such as *f*(*x*) = log *x* (either base 10 or base *e*) and* f*(*x*) = will tend to correct some of the skew since larger values are compressed. Both of these transformations don’t accept negative numbers, and so the transformations *f*(*x*) = log (*x+a*) or *f*(*x*) = may need to be used instead where *a* is a constant sufficiently large so that *x + a* is positive for all the data elements. We now show how to use a log transformation via an example.

**Example 1**: We consider the raw data in Figure 1.

**Figure 1 – Use of a log transformation to create symmetry**

If we create a QQ Plot as described in Graphical Tests for Normality and Symmetry, we see that the data is not very normal (Figure 2). We now make a log transfer. We choose log base 10, although the result would be similar if we had chosen log base (i.e. a natural log).

**Figure 2 – QQ plots of data before and after log transformation**

As we can see from Figure 2, the transformed data is a little better fit for a normal distribution. Also notice the change in skewness and kurtosis (Figure 3), since the log transformed data has values closer to what is expected from a normal distribution (see Analysis of Skewness and Kurtosis).

**Figure 3 – Skewness and kurtosis before and after the log transform**

Hello Charles. Im trying to understand the whole concept of “normality” and how the data is transformed by some statistical tests.

my question is: what is the best process to analyze normality and why? (symetrical and un-skewed gaussian distribution)

ive been dealing with the KS test, AD test and Ryan Joiner test. Some teachers have told me that these tests tend to transform the data and i cant understand how and why they do that? Im working with samples (°C temp reading of several stations between rainy and non rainy season) that i need to check normality first (of rainy and non rainy season before testing for differences between the two seasons)

Do you think that the analysis of skewness and Kurtosis is the best method to test gaussian distribution in a sample?

Thanks for your webpage.

Aldo,

Usually the best test for normality is Shapiro-Wilk’s test. This is explained on the following webpage and is supported by the Real Statistics software.

Shapiro-Wilk Test

Charles

Hi Charles,

Just wanted to say your thanks for the info provided on your site. This is at least the 3rd time I’ve used it to figure out something in stats. I’m working on my MS in Predictive Analytics, and some concepts are just plain tough to get.

The textbooks in general circulation are often lacking in their accessability. It’s great when things are presented well in plain English. You are a good writer, and explain things well for those of trying to climb this sometimes rather steep learning curve.

Again thanks for our site.

Best regards,

-John Ryle

Thanks John. I appreciate hearing that the website is easy to understand and is helping you.

Charles