When a sample is not distributed normally, and is not even symmetric, then sometimes it can be useful to transform the data so that the transformed data is more normal or at least roughly symmetric. We touch upon the subject in Transformations, and will explore this concept a bit further in this section.
When data is skewed to the left, transformations such as f(x) = log x (either base 10 or base e) and f(x) = will tend to correct some of the skew since larger values are compressed. Both of these transformations don’t accept negative numbers, and so the transformations f(x) = log (x+a) or f(x) = may need to be used instead where a is a constant sufficiently large so that x + a is positive for all the data elements. We now show how to use a log transformation via an example.
Example 1: We consider the raw data in Figure 1.
Figure 1 – Use of a log transformation to create symmetry
If we create a QQ Plot as described in Graphical Tests for Normality and Symmetry, we see that the data is not very normal (Figure 2). We now make a log transfer. We choose log base 10, although the result would be similar if we had chosen log base (i.e. a natural log).
Figure 2 – QQ plots of data before and after log transformation
As we can see from Figure 2, the transformed data is a little better fit for a normal distribution. Also notice the change in skewness and kurtosis (Figure 3), since the log transformed data has values closer to what is expected from a normal distribution (see Analysis of Skewness and Kurtosis).
Figure 3 – Skewness and kurtosis before and after the log transform