The chi-square goodness of fit test can be used to test the hypothesis that data comes from a normal hypothesis. In particular, we can use Theorem 2 of Goodness of Fit, to test the null hypothesis:

H_{0}: data are sampled from a normal distribution.

**Example 1**: 90 people were put on a weight gain program. The following frequency table shows the weight gain (in kilograms). Test whether the data is normally distributed with mean 4 kg and standard deviation of 2.5 kg.

**Figure 1 – Frequency table and histogram for Example 1**

We begin by calculating the probability that *x* < *b* for *b* = 0, 1, …, 8, assuming a normal distribution with mean 4 and standard deviation 2.5. This probability is NORMDIST(*b*, 4, 2.5, TRUE). The probability that *x* is in the interval (*a*, *b*] is then NORMDIST(*b*, 4, 2.5, TRUE) – NORMDIST(*a*, 4, 2.5, TRUE). Multiplying these figures by the sample size of 90, gives us the expected frequency.

**Figure 2 – Chi-square test based on known mean and standard deviation**

We now perform the Chi-square goodness of fit test. Since the observed and expected frequencies of the first and last interval are less than 5, it is better to combine the 1^{st} and 2^{nd} as well as the last and second to last intervals. The chi-square test statistic is 4.47, which is less than the critical value of CHIINV(.05,7) = 14.07, and so we can conclude that there is a good fit. Note that the *df* = number of interval – 1 = 8 – 1 = 7 since the mean and standard deviation are given.

**Example 2**: In the above example, the population mean and variance were known. This is usually not the case. This time we will simply ask whether the above data comes from a normal population.

We first calculate the sample mean and variance as described in Frequency Tables using the midpoint of each interval, although for the first and last intervals (-∞,0] and [8,∞) we need to guess at acceptable representative values, which we take as -1 (i.e. a weight gain of 1 kg) and 9 respectively.

**Figure 3 – Calculation of mean and standard deviation for Example 1**

We next test the null hypothesis that the data is normally distributed using the sample mean and variance (3.74 and 4.84 respectively as see in Figure 3) as estimates for the population mean/variance. As in Example 1, we combine the first two and the last two intervals so that all frequencies are at least 5. Once again we use a chi-square goodness of fit test based on 8 intervals, but this time since the mean and variance are estimated parameters, per Theorem 3 of Goodness of Fit, we use *df* = 8 – 1 – 2 = 5.

**Figure 4 – Chi-square test based on estimated mean and standard deviation**

Since = 1.35 < 11.071 = , we again retain the null hypothesis that the data is normally distributed.

Hello Charles,

first let me say ” great web-site” !!!

I’ve a question regarding observed freq resp. classes to calculate Chi-square value. In example one you combine classes to get more than five samples for each class! Is this number “five” specific? Isn’t there “room for manipulating the chi-square value” by adapting the number of classes?

Thanks and best regards,

Gerd

Hello Gerd,

Generally you would like at least 5 sample items for each cell in the contingency table. With large contingency tables, a small percentage of cells with fewer than 5 items can be acceptable. Even with smaller contingency tables this may not cause big problems, but it is probably a better choice to use Fisher Exact Test in this case. In any case, you want to avoid using chi-square for contingency tables with an expected frequency of less than 1 in any cell.

Charles

Hi, in example two, you say “…using the sample mean and variance (3.74 and 2.20 respectively as see in Figure 3)…” but Figure 3 shows 2.20 as the standard deviation, not the variance- should “variance” or “2.20” be changed in this example?

Hi Jen,

Thanks for identifying this typo. The 2.20 value is the standard deviation not the variance. I have now used the correct value for the variance (i.e. the value that appears in Figure 3). Thanks for catching this error.

Charles

hello sir, i have asked for lognormal distribution problem in K-S test for which you replied. thank you very much.

if possible please explain one problem of Log normal distribution in chi square test. it’ll be great helpful people like me, who are new to statistics.

thank you sir

Hello Sandeep,

To replicate Example 1 and 2 on the referenced page with the log normal distribution instead of the normal distribution, just replace formulas of the form =NORMDIST(x,mean,stdev,TRUE) by =LOGNORMDIST(x,mean,stdev) or LOGNORM.DIST(x,mean,stdev,TRUE).

Charles

sir. How to calculate for forecasting

Sorry, but I don’t understand your comment.

Charles

Hi Charles,

Thank you for the great article.

I’m confused. In example 2 you use a df of 5 (k-m-1 = 8-2-1). 2 since mean and variance are unknown but what causes the -1? I can see that you refer to Theorem 3 but according to wiki:

http://en.wikipedia.org/wiki/Goodness_of_fit

“where \nu is the number of degrees of freedom, usually given by N-n-1, where N is the number of observations, and n is the number of fitted parameters, ASSUMING THAT THE MEAN VALUE IS AN ADDITIONAL FITTED PARAMETER. ”

I guess the “-1” is due to the mean and the “n” is the additionally fitted parameters.

So for your example 2 it should be 8-1-1 = 6 as ONLY variance is an additional parameter?

Please correct me if i’m wrong.

Best Regards

Gustav

Gustav,

I beleve that in the example given in wikipedia the population mean is unknown (and is estimated by the sample mean) and the population variance is known. Thus df = N-n-1 = N-1-1 = N-2. Here N = number of obervations and n = number of fitted parameters = 1 in this case. If N were 8, then df = N-2 = 6.

In Example 2 of the referenced webpage, both the population mean and the population variance are unknown, and so n = 2. Since N = 8, we have df = N-n-1 = 8-2-1 = 5.

Charles

Thanks!

Hello Sir.

Im just wondering why do we need to combine classes if the expected frequency is less than 5. Why 5 but not other values? And how does it affect our results if we do not combine classes with expected frequency which is less than 5?

The value 5 is a requirement for the use of the chi-square test. With larger contingency tables you can have some cells with fewer than 5 elements, but with smaller tables (e.g. 2 x 2) cells should have at least 5 elements. With fewer than 5 elements the results of the test won’t be reliable. See webpage http://www.real-statistics.com/chi-square-and-f-distributions/independence-testing/ for more details about this requirement.

Charles

Can I use this procedure to test whether a sample data set came from a chi-square distribution? If not how do I to test for the chi-square distribution?

Yes, you can use this procedure to test whether sample data fits a chi-square distribution. See http://www.real-statistics.com/chi-square-and-f-distributions/goodness-of-fit/, especially Theorem 3 and Example 4.

Charles

Thank you for your fast reply. I need a little further clarification. I wish to test a column of computed chi square values that is 10,000 entries long. Applying Theorem 3 in example 4, as you suggested, I would use CHIDIST(chi square, df) = CHIDIST(?,9999). What would be entered into the chi square portion? I want to test the whole column, not just a single number as in example 4, so would I just enter the column in which the data is in ? A histogram of the data leads me to believe that the it does indeed fit the chi square distribution. I just need a p-value to confirm it.

I have an unrelated question, I looked through the comments above an thought I would ask my question, I am performing a goodness of fit test and the mean and SD were given to me as percentages. I am not sure what to do with these values or how to convert them into a number usable for my expected values.

Many thanks

Jared,

It really depends on what these percentages represent, but the likely answer is that you simply multiply the percentages by the sample size.

Charles

I’m looking for a non-traditional way to explain GOF.

In Example 2 with df=5 and Chi^2=1.35 is there about a 7% probability that we would be correct if we said the data were not normally distributed?

Does that imply that there is a 93% probability that the data are normally distributed?

Alternatively, if we try fits for several types of distributions we can say that there is a 7% chance that we are wrong if we reject normal; an x% chance that we are wrong if we reject uniform, etc.

Do we need to make the negative statement or can we make a positive statement?

John,

No this is not correct. Actually you need to look at the conditional probabilities given that the null hypothesis true.

“Suppose we perform a statistical test of the null hypothesis with α = .05 and obtain a p-value of p = .04, thereby rejecting the null hypothesis. This does not mean that there is a 4% probability of the null hypothesis being true, i.e. P(H0)=.04. What we have shown instead is that assuming the null hypothesis is true, the conditional probability that the sample data exhibits the obtained test statistic is 0.04; i.e. the probability of D given that H0 is true = P(D|H0)=.04 where D = the event that the sample data exhibits the observed test statistic.”

Charles

Where do you get the values of ‘a’

The value of a is simply the value of x prior to b in the frequency table. For this example, if b = 3 then a = 2.

Charles

Hello Mr.Charles,

It is my understanding that using Chi-Square test, I can check goodness of fit of my data. So, I can check for example, if my data follows binomial distibution with some probability of success.

Now, suppose I believe my data follows a Chi-Square distribution then how would I check it? Hope it is not an absurd question, in which my apologies.

Hello,

This is certainly not an absurd question. You can use the chi-square goodness of fit test as described on the webpage

Goodness of Fit

Charles

Excellent guidance. Congratulations!

Andres Rubio

Finance Professor and Consultant

How can I test normality for a sample of 36 monthly returns in percentage for a stock?

Is N = 36 a large enough sample te reasonably test normality or should I increase N to say, 48 or 60…?

Thanks Charles!

N = 36 should be a big enough sample. I suggest that you use a test like Shapiro-Wilk instead of Chi-square to test for normality.

Charles

Hey, great example!

I’m trying to use the Chi-Squared Goodness of Fit test to see if I can assume normality for further tests on my 2 samples of data. Basically I recorded battery drain times for 2 popular brands of batteries, 20 samples per brand. I want to see if I can assume normality for the 2 samples. What would my Ho and Ha be?

Thanks for the help!

As stated on the referenced webpage, H0: data are sampled from a normal distribution, and so Ha: data are not sampled from a normal distribution.

Charles

Dear Charles,

I would like to thank you for this extremely useful resource !

I have a question regarding normality check via Chi-Square testing and sample size. I am applying your calculation to a case in which measurement of dust is involved. This means that there is a very large sample size. Since the dust grains measured are not really counted, but only weighted its amount in classified sizes, the results of frequency are given in percentage. Thus, I assume a sample size of 100, but I get extremely large X2 values, that, compared to an independent to the sample size X2 critical, and thus constant, make my conclusion always NOT NORMAL distribution. I fulfill all the criteria for the tests (more than 5 classes, larger frequency than 5, or grouped frequencies, etc) I’ve cross checked some of this distributions with Shapiro-W test and they are normally distributed.

I tried to lower and to increase the population number for the Chi2-testing keeping the % fractions but still, I either get too low frequencies or too large X2…

To rule out that I have overseen something, I took your example “Norm Chi-sq 1” and multiplied by 10 or 100 the given frequencies and the same effect occurs. Is there any explanation to this phenomena? Am I overseeing something? What would be your recommendation to proceed?

Thanks.

Dear Juan,

I don’t completely understand the problem that you are having with the chi-square test, but this is not really a great test for normality. Shapiro-Wilk is usually one of the best tests for normality. I would also create a graph (e.g. Q-Q plot) to make sure.

Charles

Thank you for your quick answer !

In short what I mean is that the tests seems very sensitive for sample size: if sample size goes up, the X2 calculated goes up very much and it is then very easy to be out of normality… If you take your example and keep the ratios between frequencies (imagine that they would be given as percentages) and you increase the “n”, the test changes drastically… is that a known effect?

Juan,

Most statistical tests are sensitive to sample size. With very big samples it is often easier to find a significant effect.

Charles

Please, next time, indicate how the mean, variance in figure 3 are computed. Or, better yet, show the formulas for every equation so that we wouldn’t have to make guesses as to how they were computed.

Gelo,

As stated on the referenced webpage, the calculation is described on the webpage Frequency Table. You can also get the formulas from the Examples Wprkbook which contains all the spreadsheets shown on the website. You can download this for free.

Charles

Charles

I was studied that we use normality test to test our data normal distributed or not. And it will decide the method we use for hypothesis testing: parametric or non-parametric test. For testing in 1 sample as your example, we can easily to conclude. However, how about use normality test for 2 more samples in a problem? We use normality test for each 2 sample, right? So if other sample is not normal distribution, how we can conclude it? I confuse this when we have 2 more samples to decide the method for hypothesis testing.

Thank you so much.

Laura,

For tests such as ANOVA you need to test each group sample for normality. In a 3 x 3 design, this means that you need to test each of the 9 groups for normality. Remember though that ANOVA and many other tests are pretty robust for departures from normality. Happy New Year.

Charles

Happy New Year, sir

In the case of 2 samples. If this population is normal distributed, I will using testing 2 means for hypothesis testing. However, if this population is not normal distributed. My hypothesis testing will Mann-Whitney U test for independent sample or Wilcoxon Matched-Pairs Singed Rank test for dependent sample.

So, my question is if 1 sample in normality test is normal distributed, other is not normal distributed? Does this case happended? This question is same for 3 means or more in order to decide using ANOVA or Krushal test. I’m so confused between when we use parametric test and non-parametric test.

Thank you so much, sir.

Laura,

When comparing two samples, each sample should be normal. If one is normal and the other is not, then the test may not be valid. Even so, a t test is pretty robust to violations of normality. Generally, a problem occurs when one or both samples are far from symmetric. If both samples are skewed to the right, then you are probably better off using a nonparametric test (Mann-Whitney).

Charles

Hello Sir!

Happy new year!

Thank you for your help in advance,

I have a question, , it is about motivation of students in learning English,

How I could know the normality and homogeneity in order to compare who is more motivated girls or boys? Motivation is consists of integrative and instrumental motivation, but I should do it manually, my question is how I could do this? Do you think I have to use chi_ square or another way…?

There are many tests for normality. In general, I suggest that you use the Shapiro-Wilk test. You should test both the boys sample and the girls sample for normality (separately). See the following webpage:

Shapiro-Wilk

There are also many tests for homogeneity of variances. I suggest that you use Levene’s test. See the webpage

Levene’s Test

If you use the t-test with unequal variances, then you don’t need to check for homogeneity of variances. See the webpage

t test with unequal variances

Charles

Hello Sir!

Thank you for your help in advance,

I have a question, , it is about motivation of students in learning English,

How I could know the normality and homogeneity in order to compare who is more motivated girls or boys? Motivation is consists of integrative and instrumental motivation, but I should do it manually, my question is how I could do this? Do you think I have to use chi_ square or another way…?

Hedi,

There are many tests for normality. In general, I suggest that you use the Shapiro-Wilk test. You should test both the boys sample and the girls sample for normality (separately). See the following webpage:

Shapiro-Wilk

There are also many tests for homogeneity of variances. I suggest that you use Levene’s test. See the webpage

Levene’s Test

If you use the t-test with unequal variances, then you don’t need to check for homogeneity of variances. See the webpage

t test with unequal variances

Charles

Hi Charles

Great site, and v useful pages. Just wondering why you recommend SW over ChiSq (which is easy to implement and well-recognised). Is it because of better power ?

Thanks

Better power and more accurate.

Charles

Forget to write that there are 50 participants, 18 boys n 32 girls. I want to compare them. Are they normal and homogene or not..? If it was not normal then for comparing what I should use…?

Dear Charles

i would like to ask how i check normality or the distribuation of my data by prism or excel for biological data for ex : westrenblotting data to decide to use anova or nonparmetric test

Sam,

See the Testing for Normality webpage.

Generally I use the Shapiro-Wilk test plus visual inspection of the data (or QQ Plot).

Charles

Hi Charles,

In example 1, when you say: “The probability that x is in the interval (a, b] is then NORMDIST(b, 4, 2.5, TRUE) – NORMDIST(a, 4, 2.5, TRUE)” can you please tell me what is the meaning of “a”?. I have tried to do the calculations taking “a” as the frequency or fx or fx^2 but none of those work. Thanks

John,

Here I am referring to cumulative probability, i.e. F(x). F(a) = the probability that the outcome is less than a. Thus, the probability that the outcome is between a and b is F(b) – F(a).

Charles

Hi Charles,

Could you tell me value of “a” in this example. In a general case, how to choose value for “a”?

Thanks.

Minh,

a can take any value. In fact, in Figure 2 you can see that a takes a variety of values.

Charles

Hi, I’m a bit of a noob in stats and I’m stuck with the Chi squared methods at the moment. I need to use it to test the normality of some data I’ve been supplied with (sample size of 40, sorted into 8 groups of 5), I’ve sorted it into ascending order, found the average values at the boundary of each group, and then used these to find the value i need to use to compare to a normal distribution curve, however I’m stuck trying to find out how to do this in excel? any help would be great thank you 🙂

Ben,

Is there some reason why you are testing for normality in this way? Why can’t you simply test normality on all 40 elements? (although for some tests — e.g. Anova — you need to check each group for normality) Also, generally chi-square is not the best test for normality. Shapiro-Wilk is usually a better test.

Charles

Hey

Good page

But, could you put Excel formula in order to calculate cell O15

Thanks

Ramon,

The critical value is CHIINV(alpha,df) = CHIINV(.05,7) using Excel 2007 or CHISQ.INV.RT(alpha,df) = CHISQ.INV.RT(.05,7) using more recent versions of Excel.

Charles

That is =CHIINV(0.95, 7)

It is not =CHIINV(0.05, 7)

CHIINV is the right-tailed inverse (equivalent to CHISQ.INV.RT) and so I think it is CHIINV(.05,7).

Charles

If you put =CHIINV(0.05,7) in Excel you get 2.16734991

If you put =CHIINV(0.95,7) in Excel you get 14.0671404

So, correct formula is the second

Best regards

Ramon,

That’s interesting; when I enter =CHIINV(.05,7) on my computer I get 14.067… If I enter =CHISQ.INV(.05,7) I get 2.167… If I enter =CHISQ.INV.RT(.05,7) I get 14.067…

Charles

It’s interesting

Maybe Excel configuration

I will check it

Hi Charles,

Thanks for the excellent web page, extremely useful!

I am getting slightly confused when using different significance levels and whether or not we would accept the null hypothesis.

In your example, the test statistic is 1.35 and as this is less than the critical region CHIINV(0.05,5)=11.07 then we accept.

Imagine our test statistic was 12. Under a 5% significance level we would reject H0. But if we used a 1% significance level the critical region would be CHIINV(0.01,5)=15.09. This would mean we would reject the null hypothesis under 5% but accept under 1%.

However I though using a smaller significance level is ‘more reliable’. So I am confused that a sample could exhibit less Normal qualities i.e. higher test statistic and still pass a ‘more robust test’.

Thanks!

Chris,

Sorry for the delay in answering your question.

The smaller the value of alpha, the smaller the critical region, i.e. the region where the null hypothesis is rejected. This means that for a lower alpha value, it is less likely that the null hypothesis would be rejected. This is consistent with your example. At 5% the data is not consistent with a normal population, while for 1% the data is consistent (enough) with a normal population.

Perhaps another way at looking at this is that 1% the acceptance region is larger than the acceptance region at 5%. Also at 5% we can afford to be wrong 1 out of 20 times, while at 1% we can afford to be wrong only 1 out of 100 times.

Charles

Hello Charles,

Can we use the Chi-Squared test for normality when we have actual sample data ? I see the two cases you presented are:

1)When data is presented in terms of frequency tables

2)When we are testing against a specific pair (mean, st.dev)

Now , I have 120 sample data points. Can I test whether these points come from a normal population by calculating the sample mean and sample deviation ( S.E ?) and applying method 2?

Thanks in Advance.

Guero,

Yes, but in this case I suggest that you use the Lillifiers version of the test since you will get more accurate answers. See the webpage

Lilliefors Test

In general, I find that the Shapiro Wilk test for normality is more accurate than the chi-square approach. See the following webpage for information about Shapiro-Wilk

Charles

Thanks, Charles, another one, please:

Say we have a multilinear regression:

Y ~a1X1+a2X2+a3X3

We want the residuals (Y- (a1X1+a2X2+a3X3)) ,

to be normally distributed. If they are,

does it follow that the residuals Y|X1, Y|X2, Y|X3 ; Y|Xi

means Y restricted to Xi

( i.e., we regress Y against X1, holding X2=X3=0) are also

normally -distributed?

I guess this is equivalent to asking whether the residuals

for Y~a1X1+a2X2+a3X3 are jointly normal?

Hope I didn’t make this confusing and thanks again.

Guero.

Hi Charles,

I have data of students with age , gender , IQ scores and thumb size. I want to test normality can you guide how to proceed.

Tanmay,

I suggest that you use the Shapiro-Wilk test and the QQ plots. These are described on the Real Statistics website.

Charles

Hello, i wanna ask, if the data is normally distributed which means it is parametric, can i use chi square test, which actually for the parametric test?

Thank you.

Nur,

It depends on what you want to use the chi square test for.

Charles

Now , I have 4 column (4 categories), each catergories have 10 points. Can we use the Chi-Squared test for normality and how can I do it? (Using only Chi Square Test). Since my lecturer only taught Chi-squared Test, I can not apply another method such as Lilliefors Test,……

David,

Sorry, but I don’t understand the first sentence of your comment.

Note that Lilliefors test is the same as the Chi-square Goodness of Fit test using a different table of critical values. You should use Lilliefors test when you are estimating the mean and standard deviation from the data and the Chi-square test when the mean and standard deviation are known.

Charles

For example, I have a biostatistics problem like this:

A scientist determined the effectiveness of segmental wire fixation in athletes with spondyolysis. Between 1993 and 2000, 20 athletes (6 women and 14 men) with lumbar spondyolysis were treated surgically with the technique. The following table gives the Japanese Orthopedics Association evaluation score for lower back pain syndrome for men and women prior to the surgery. The lower score indicates less pain.

Gender JOA scores

Female 14,13,24,21,20,21

Male 21,26,24,24,22,23,18,24,13,22,25,23,21,25

Give conclusion for the evaluation of the segmental wire fixation treatment between male and female?

So, this is the question. To solve this problem, I have to do 3 steps:

– test the variance (F -test)

– Normality test (Chi- square distribution) to determine the population is normally distributed or not.

– After using the normality test and depending on the condition’s question to apply ANOVA or kinds of non-parametric test.

I get stuck in question 2. Can I gather all data points in one group and use chi square test to find the population is normally distributed or not ?

David,

Each group should be tested for normality. I suggest that you use the Shapiro-Wilk test instead of the chi-square test. If you use the chi-square test, I suggest that you use the Lilliefors version of the test.

Charles

Hi,

i have financial data for 80 firms for 10 years,2007-16,,,,with 3 explanatory and 1 moderating variable.

1- how can i check the normality of my data?

2- my R2 is very low (0.04), even i creased the sample size from 60 firms to 80, but still result the same….while P.value is less than 0.01.

Moin,

1. I suggest that you use the Shapiro-Wilk test to check for normality.

2. It may be that the data is not a good fit for the regression that you are conducting. If you are conducting multiple linear regression, then you should draw scatter plots (e.g. each independent variable vs. dependent variable). If these don’t look linear, then you may have a problem.

Charles

Hello Charles,

Could you please provide some insights or point to reference work that would explain why midpoint are used in Example 2. I understand we use them because the population’s mean and stdev are unknown, but I’d like to be able to understand the mathematical intuition behind this? Thanks very much for your time and your awesome contribution to statistics on the WWW!

Martin,

You need to pick some value and the midpoint seems a reasonable choice. If the data is heavily skewed, you might actually pick a different, more representative point.

I am please that you appreciate my contribution to statistics on the web. I am trying to do my part.

Charles