As we can see throughout this website, most of the statistical tests we perform are based on a set of assumptions. When these assumptions are violated the results of the analysis can be misleading or completely erroneous.

Typical assumptions are:

**Normality**: Data have a normal distribution (or at least is symmetric)**Homogeneity of variances**: Data from multiple groups have the same variance**Linearity**: Data have a linear relationship**Independence**: Data are independent

We explore in detail what it means for data to be normally distributed in Normal Distribution, but in general it means that the graph of the data has the shape of a bell curve. Such data is symmetric around its mean and has kurtosis equal to zero. In Testing for Normality and Symmetry we provide tests to determine whether data meet this assumption.

Some tests (e.g. ANOVA) require that the groups of data being studied have the same variance. In Homogeneity of Variances we provide some tests for determining whether groups of data have the same variance.

Some tests (e.g. Regression) require that there be a linear correlation between the dependent and independent variables. Generally linearity can be tested graphically using scatter diagrams or via other techniques explored in Correlation, Regression and Multiple Regression.

We touch on the notion of independence in Definition 3 of Basic Probability Concepts. In general, data are independent when there is no correlation between them (see Correlation). Many tests require that data be randomly sampled with each data element selected independently of data previously selected. E.g. if we measure the monthly weight of 10 people over the course of 5 months, these 50 observations are not independent since repeated measurements from the same people are not independent. Also the IQ of 20 married couples doesn’t constitute 40 independent observations.

Almost all of the most commonly used statistical tests rely of the adherence to some distribution function (such as the normal distribution). Such tests are called **parametric** tests. Sometimes when one of the key assumptions of such a test is violated, a **non-parametric** test can be used instead. Such tests don’t rely on a specific probability distribution function (see Non-parametric Tests).

Another approach for addressing problems with assumptions is by transforming the data (see Transformations).

Dear Mr. Zaiontz,

thank you for providing such a great website. I am currently taking a statistics course and I often wonder about the applicability of statistical inference tools to the real world considering the assumptions that are required. As far as I understand, the assumptions gurantee the validity of the conclusions since when the assumptions are met the validity of the conlusions can be proved matehmatically. But in a mathematical proof I use clear cut logic which assumes that those assumptions are perfectly met which however in the real world probably is rarely if ever the case. So my question is whether most statistical inference tools such as hypothesis tests or confidence intervals still give appropriate conclusions even if for example an underlying normality assumption is not met perfectly but at least approximately or if my data is not perfectly symmetric (as for example assumed in the Wilcoxon Rank Test)?

Sebastian,

Glad you like the website.

You pose a great question. Statistics is not just a mathematical discipline, but it is supposed to provide practical application to real-world problems. In general, statistical tests are reasonably robust to small departures from the assumptions. Robust means that if you are testing whether say the p-value < .05, the test really tests for this (and not that a type I error of .05 should really be .08). Also some assumptions are more sensitive than other assumptions. E.g. ANOVA requires that the data be normally distributed and the variances of all the groups be equal. The test is quite robust to violations of the first assumption. Even when the data are not so normally distributed (especially if the data is reasonably symmetric), the test gives the correct results. ANOVA is much more sensitive to violations of the second assumption, especially when the group sizes are different. If you have very different group sizes, you probably want to use a different test. But even here, if the group sizes are the same and the largest group variance is no more than 3 or 4 times the smallest group variance, then the test is likely to be quite reliable. Charles

Thank you for the fast response

HELLO Mr Charles

Am a 200L student of Anchor University Lagos, Nigeria and i was given an assignment with a first time experience in this course and i was asked that what are the things to do to your data if the parametric assumptions is not met?

My delight,

Tomi.

Tomi,

The two main approaches are:

1. Use a data transformation

2. Use a non-parametric test instead

The Real Statistics website describes both approaches.

Charles

Hello Charles Zaiontz,

Firstly, thank you for this educative and enlightening post.

Please, if my data should violate the four statistical assumption (with particular interest in normality assumption). What are the ready at my disposal?

Best regard,

Emmanuel

Emmanuel,

It depends on what hypothesis you are trying to test.

Charles

pls can u represent these assumptions in grath for me?

You need to look at the assumptions for the specific tests.

Charles

Pingback: Assumptions in Data Analysis – Data Science Reflections

Good day. I am researching on this topic ‘ Home and School factors as determinants of Secondary School Students enrollment in Financial Accounting. The School variables are: 1. school type ( Public and Private), 2. school location ( Urban and Rural), 3. teachers qualifications, 4. teachers’ methods of teaching.

The Home variables are: 1. Parents’ occupations, 2. Parent’s educational background, 3. Parents’ aspirations.

The dependent variable is enrollment in Financial Accounting.

Please what statistical analysis instrument can i use to analyse my data. Please help me because i have been having serious headache on this. Please its urgent sir.

Thanks alot

Sorry, but you haven’t provided the type of information necessary for me to answer your question.

What hypothesis are you trying to test?

Charles

thanks so much.

however, my dilemma is on the issue of skewness. any more info about it?

See

http://www.real-statistics.com/tests-normality-and-symmetry/analysis-skewness-kurtosis/

http://www.real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/dagostino-pearson-test/

Charles

Which are the assumptions of Non-parametric tests ?

Aumi,

It depends on the nonparametric test, but usually there are fewer assumptions than for a corresponding parametric test.

Charles

I am trying to understand the true meaning behind Kurtosis ? Can you define and explain its overall purposes for layman like me, please, thanks.

Rudy,

The kurtosis is the fourth moment about the mean divided by the variance squared, nothing more and nothing less, although not exactly a layman’s description. We used to view that kurtosis was a measure of flatness of the distribution, but apparently that is not true. Now I look at it as one means of determining whether data is normally distributed. If the kurtosis of the data is not sufficiently similar to the kurtosis of a normal distribution then we have evidence against that data coming from a normal distribution.

Charles

Thank you for this. In your opinion, what are the assumptions underlying the use of parametric tests? I am trying to understand this method.

Rudy,

The assumptions depend on the specific parametric test (t test, ANOVA, etc.). For each test, the website will describe the specific assumptions for that test.

Charles

Kurtosis is simply concentration of data around the mean:

1. Leptokurtic : Data is more clustered around the mean, kurtosis value is large positive, standard deviation (deviation of values from the mean) is low.

2. Platykurtic : Data is uniformly distributed about the mean.

3. Mesokurtic : Data is normally distributed but doesn’t mean it’s a standard normal distribution, standard deviation is high.

Good day, my name is Akeem, I am testing for normality and independency in a multivariate data. but I am confused about the test that must be done before the other. my question is should normality test come before the independent test?

Hello,

I’m a PhD student and I want to analysis my results. I have 5 independent factors (each one has three levels) and one dependent. I want to select a suitable statistical analysis. I check only the normality and it showed a normal distribution. I fell confused from the number of tests. Could you please helm me in that?

Many Thanks,

Anwer,

You need to determine what sort of hypothesis you want to test before you can decide what is the suitable statistical analysis.

Charles

Charles,

Thanks for reply.

I want to see the relationship between 5 independents ( 4numerical and 1 categorical ) and dependent value and find the optimum values. in other words, the effects of parameters on output and which one the most significant.

Thanks,

Anwer

Anwer,

This sounds like a regression-type scenario. I suggest that you start by looking at the Regression part of the website.

Charles

hi,

i’m a student and doing a research on the relationship between communication factors and job satisfaction among PB staff.

my sample size is 56 because of the population are very small.

the normality test i’ve done is not normal.

my question is if i used non parametric, does it mean i don’t have to analyze the hypotheses test, correlation, regression analysis (where parametric usually analyze) ?

thank you 🙂

Hani,

Two observations:

1. Just because data isn’t normal doesn’t necessarily mean that you can’t use a parametric test. It usually depends on how far from normal the data is. You can sometimes apply a transformation which makes the data normal.

2. Nonparametric tests can often perform very similar analyses as parametric tests; it depends on the type of analysis you want to perform.

Charles

so, if i used 1-sample k-s test for normal distribution.

i can still continue the other analysis using parametric test, isn’t it? but it depends on how far from the normal data?

Hani,

It also depends on what other analyses you want to do.

Charles

Hi, i’m doing a lab report right now, and for my data they meet two of the three assumptions for a parametric test such as an ANOVA or linear regression?

The data is normally distributed and there’s independent data. However, there’s no equal variance. The levene’s test gave a significance value of 0.039.

So can i still use an ANOVA or regression, and if so, how do i justify this?

Thank you

Katy,

If the homogeneity of variance assumption is not met, Welch’s ANOVA is a commonly used substitute.

For linear regression you can use robust standard errors.

Both of these approaches are covered on the website and are included in the Real Statistics Resource Pack.

Charles

Hi

I have some data (x axis represnts fouling resistance and y axis represents organics) and I simply made a correlation using excel between x and y axis. Reviewer wanted to know what assumption was made regarding the normality of data distribution. Can you please give an answer?

Kind regards

Biplob

You don’t need to assume normality to calculate a correlation coefficient. Depending on which statistical test you use to may need the normality assumption when you test whether this correlation is significantly different from zero. See the following webpage for details

Correlation

Charles

pls am Woking on Immunological assessment of Hiv and Hepatitis B in pregnant women, pls wot kind of assumption and statistical study I wl employ. is my research a retrospective, prospective or cross section. What statistical analysis am expected to use .ANOVA, t test, z test, correlation or regression

Thanks in Advance.

Sorry, but it is not possible for me to answer your question without more details.

Charles

My study is about innovations of sped teachers in.inclusive education and it descriptive. I am.confuse with the stat tool of my assumptions such as factors, best innovations, challenges, ate to be described..kindly give idea of what stat tool.sir..thank you

Sorry, but you need to be more specific. You need to explain what specifically you are trying to accomplish before anyone can suggest tools to use.

Charles

Does all of these test have an assumption of independence?

T test

Paired t test

CRD ANOVA

Mario,

For the two sample t test or CRD ANOVA, the group samples must be independently drawn

For the paired t test, the pairs of observations are independent, but clearly each observation in the pair is not independent of the other observation in the pair.

For

what are the there statistical assumptions made about the population when testing a hypothesis?

It depends on the test.

Charles

hello i am sahibzadi from pakistan

kindly tell me when we say that observation should be independent in parametric test then is it possible in repeated measure t test

For the paired / repeated measures t test, the pairs of observations are independent, but clearly each observation in the pair is not independent of the other observation in the pair.

Charles

state the assumptions for testing the difference between two means .If those assumptions are met or not met what test are use in Multivarient data anaylysis

plz ans this question……………….

You can find this information by looking at the webpages on the t test. If the assumptions are not met, then the usual substitutes are the Mann-Whitney and Wilcoxon Signed Ranks tests (or occasionally the Signed test). These tests are also described on the website. Enter the approach test in the Search box.

Charles

What statistical assumptions are made for descriptive statistics or measures of dispersion?

Thanks in advance.

None that I can think of except dividing by zero.

Charles

Thanks!

I am not sure if a variable is creating an endogeneity bias in a regression. I collected the residuals from the estimated regression and there is no correlation between the potential endogenous variable and the errors. Is this an adequate test?

Jerry,

This seems like a reasonable approach to me. Having said that, I know that this issue has been studied and other tests such as Hausman’s Test can be used as well as instrumental variables. The following is a paper which maybe useful to you.

www-2.dc.uba.ar/alio/io/pdf/claio98/paper-12.pdf

Charles

Hi Charles,

If your researching 2 ways of working by comparing 2 factors (say costs and duration) with each other from data of 80+ projects (half being projects done by the new way of working, half done by traditional way), should you use z-test, or always add ANOVA and pearson/spearman to the analysis?

Thank you in advance!

Rick,

If you want to take the interaction of cost and duration into account, you should probably use ANOVA. If the interaction is not important then two t tests seems to be a reasonable way to go. In either case, you need to make sure that you satisfy the assumptions for that test.

Charles

Thank you!

Assumptions of the following statistic or statistical tool:

Classify whether parametric or non-parametric.

• z-test of mean difference

• t-test of mean difference

• z-test of correlated means

• t-test of correlated means

• Pearson Product-Moment correlation Coefficient

• Spearman Rank Correlation Coefficient(rho)

• Chi-square goodness-of-fit

• Chi-square of Independence

One Way ANOVA(Analysis of Variance

The first 4 are parametric. The 5th is not a test, but the usual tests are parametric. The next 3 are non-parametric and the last is considered to be parametric.

Charles

hi..i just wanna ask u. Is it right to test for significant difference or (parametric test) in convenience samples?

thanks in advance 🙂

You can use all the usual statistical tests with convenience sample, but you should be cautious about your conclusions since the nature of the sampling technique introduces all sorts of biases in comparison to random sampling.

Charles

hello 🙂

i am master student from malaysia.

my advisor asked me to include the assumptions in my thesis.

can you help me which chapter should i include the assumptions?

is it under the research methodology or is it under findings?

thanks in advance 🙂

Probably under research methodology, but this depends on the organization of your thesis.

Charles

Are there any other statistical assumptions to be aware of?

Pete,

I have listed the principal types of assumptions for statistical tests on the referenced webpage. Not all tests use all these assumptions. Other assumptions are made for certain tests (e.g. sphericity for repeated measures ANOVA and equal covariance for MANOVA). For each test covered in the website you will find a list of assumptions for that test.

Charles

what do assumption mean in statistic? what do they provide?

Soniya,

Many statistical tests give valid results only when certain assumptions are met. E.g. the data must be normally distributed or the variances of the data are equal.

Charles

Hello

My name Bahram from Iran. now, I am a ph.D student in watershed management in Malaysia.

about my thesis, my supervisory committee have a question:

– Explain the reason for using ANOVA, do you the data collected meet parametric statistical assumptions?

Thank you

Hello Bahram,

The reason for using ANOVA is given on the webpage http://www.real-statistics.com/one-way-analysis-of-variance-anova/

The assumptions for ANOVA are given on the webpage http://www.real-statistics.com/one-way-analysis-of-variance-anova/assumptions-anova/

Charles