Assumptions for Statistical Tests

As we can see throughout this website, most of the statistical tests we perform are based on a set of assumptions. When these assumptions are violated the results of the analysis can be misleading or completely erroneous.

Typical assumptions are:

  • Normality: Data have a normal distribution (or at least is symmetric)
  • Homogeneity of variances: Data from multiple groups have the same variance
  • Linearity: Data have a linear relationship
  • Independence: Data are independent

We explore in detail what it means for data to be normally distributed in Normal Distribution, but in general it means that the graph of the data has the shape of a bell curve. Such data is symmetric around its mean and has kurtosis equal to zero. In Testing for Normality and Symmetry we provide tests to determine whether data meet this assumption.

Some tests (e.g. ANOVA) require that the groups of data being studied have the same variance. In Homogeneity of Variances we provide some tests for determining whether groups of data have the same variance.

Some tests (e.g. Regression) require that there be a linear correlation between the dependent and independent variables. Generally linearity can be tested graphically using scatter diagrams or via other techniques explored in Correlation, Regression and Multiple Regression.

We touch on the notion of independence in Definition 3 of Basic Probability Concepts. In general, data are independent when there is no correlation between them (see Correlation). Many tests require that data be randomly sampled with each data element selected independently of data previously selected. E.g. if we measure the monthly weight of 10 people over the course of 5 months, these 50 observations are not independent since repeated measurements from the same people are not independent. Also the IQ of 20 married couples doesn’t constitute 40 independent observations.

Almost all of the most commonly used statistical tests rely of the adherence to some distribution function (such as the normal distribution). Such tests are called parametric tests. Sometimes when one of the key assumptions of such a test is violated, a non-parametric test can be used instead. Such tests don’t rely on a specific probability distribution function (see Non-parametric Tests).

Another approach for addressing problems with assumptions is by transforming the data (see Transformations).

54 Responses to Assumptions for Statistical Tests

  1. richard says:

    thanks so much.
    however, my dilemma is on the issue of skewness. any more info about it?

  2. Aumi says:

    Which are the assumptions of Non-parametric tests ?

    • Charles says:

      Aumi,
      It depends on the nonparametric test, but usually there are fewer assumptions than for a corresponding parametric test.
      Charles

  3. Rudy says:

    I am trying to understand the true meaning behind Kurtosis ? Can you define and explain its overall purposes for layman like me, please, thanks.

    • Charles says:

      Rudy,
      The kurtosis is the fourth moment about the mean divided by the variance squared, nothing more and nothing less, although not exactly a layman’s description. We used to view that kurtosis was a measure of flatness of the distribution, but apparently that is not true. Now I look at it as one means of determining whether data is normally distributed. If the kurtosis of the data is not sufficiently similar to the kurtosis of a normal distribution then we have evidence against that data coming from a normal distribution.
      Charles

      • Rudy says:

        Thank you for this. In your opinion, what are the assumptions underlying the use of parametric tests? I am trying to understand this method.

        • Charles says:

          Rudy,
          The assumptions depend on the specific parametric test (t test, ANOVA, etc.). For each test, the website will describe the specific assumptions for that test.
          Charles

    • RIYA EDWINA says:

      Kurtosis is simply concentration of data around the mean:

      1. Leptokurtic : Data is more clustered around the mean, kurtosis value is large positive, standard deviation (deviation of values from the mean) is low.

      2. Platykurtic : Data is uniformly distributed about the mean.

      3. Mesokurtic : Data is normally distributed but doesn’t mean it’s a standard normal distribution, standard deviation is high.

  4. akeem says:

    Good day, my name is Akeem, I am testing for normality and independency in a multivariate data. but I am confused about the test that must be done before the other. my question is should normality test come before the independent test?

  5. Anwer says:

    Hello,
    I’m a PhD student and I want to analysis my results. I have 5 independent factors (each one has three levels) and one dependent. I want to select a suitable statistical analysis. I check only the normality and it showed a normal distribution. I fell confused from the number of tests. Could you please helm me in that?

    Many Thanks,

    • Charles says:

      Anwer,
      You need to determine what sort of hypothesis you want to test before you can decide what is the suitable statistical analysis.
      Charles

      • Anwer says:

        Charles,
        Thanks for reply.
        I want to see the relationship between 5 independents ( 4numerical and 1 categorical ) and dependent value and find the optimum values. in other words, the effects of parameters on output and which one the most significant.
        Thanks,

        Anwer

        • Charles says:

          Anwer,
          This sounds like a regression-type scenario. I suggest that you start by looking at the Regression part of the website.
          Charles

  6. hani says:

    hi,

    i’m a student and doing a research on the relationship between communication factors and job satisfaction among PB staff.
    my sample size is 56 because of the population are very small.
    the normality test i’ve done is not normal.
    my question is if i used non parametric, does it mean i don’t have to analyze the hypotheses test, correlation, regression analysis (where parametric usually analyze) ?

    thank you 🙂

    • Charles says:

      Hani,
      Two observations:
      1. Just because data isn’t normal doesn’t necessarily mean that you can’t use a parametric test. It usually depends on how far from normal the data is. You can sometimes apply a transformation which makes the data normal.
      2. Nonparametric tests can often perform very similar analyses as parametric tests; it depends on the type of analysis you want to perform.
      Charles

      • hani says:

        so, if i used 1-sample k-s test for normal distribution.
        i can still continue the other analysis using parametric test, isn’t it? but it depends on how far from the normal data?

  7. katy says:

    Hi, i’m doing a lab report right now, and for my data they meet two of the three assumptions for a parametric test such as an ANOVA or linear regression?
    The data is normally distributed and there’s independent data. However, there’s no equal variance. The levene’s test gave a significance value of 0.039.

    So can i still use an ANOVA or regression, and if so, how do i justify this?

    Thank you

    • Charles says:

      Katy,

      If the homogeneity of variance assumption is not met, Welch’s ANOVA is a commonly used substitute.
      For linear regression you can use robust standard errors.

      Both of these approaches are covered on the website and are included in the Real Statistics Resource Pack.

      Charles

  8. Biplob Kumar Pramanik says:

    Hi

    I have some data (x axis represnts fouling resistance and y axis represents organics) and I simply made a correlation using excel between x and y axis. Reviewer wanted to know what assumption was made regarding the normality of data distribution. Can you please give an answer?

    Kind regards
    Biplob

    • Charles says:

      You don’t need to assume normality to calculate a correlation coefficient. Depending on which statistical test you use to may need the normality assumption when you test whether this correlation is significantly different from zero. See the following webpage for details
      Correlation
      Charles

  9. Abraham says:

    pls am Woking on Immunological assessment of Hiv and Hepatitis B in pregnant women, pls wot kind of assumption and statistical study I wl employ. is my research a retrospective, prospective or cross section. What statistical analysis am expected to use .ANOVA, t test, z test, correlation or regression
    Thanks in Advance.

  10. Rhodora Ruiz says:

    My study is about innovations of sped teachers in.inclusive education and it descriptive. I am.confuse with the stat tool of my assumptions such as factors, best innovations, challenges, ate to be described..kindly give idea of what stat tool.sir..thank you

    • Charles says:

      Sorry, but you need to be more specific. You need to explain what specifically you are trying to accomplish before anyone can suggest tools to use.
      Charles

  11. mario says:

    Does all of these test have an assumption of independence?
    T test
    Paired t test
    CRD ANOVA

    • Charles says:

      Mario,

      For the two sample t test or CRD ANOVA, the group samples must be independently drawn

      For the paired t test, the pairs of observations are independent, but clearly each observation in the pair is not independent of the other observation in the pair.

      For

  12. denise says:

    what are the there statistical assumptions made about the population when testing a hypothesis?

  13. sahibzadi says:

    hello i am sahibzadi from pakistan
    kindly tell me when we say that observation should be independent in parametric test then is it possible in repeated measure t test

    • Charles says:

      For the paired / repeated measures t test, the pairs of observations are independent, but clearly each observation in the pair is not independent of the other observation in the pair.
      Charles

  14. Arsalan says:

    state the assumptions for testing the difference between two means .If those assumptions are met or not met what test are use in Multivarient data anaylysis

    plz ans this question……………….

    • Charles says:

      You can find this information by looking at the webpages on the t test. If the assumptions are not met, then the usual substitutes are the Mann-Whitney and Wilcoxon Signed Ranks tests (or occasionally the Signed test). These tests are also described on the website. Enter the approach test in the Search box.
      Charles

  15. Powei says:

    What statistical assumptions are made for descriptive statistics or measures of dispersion?
    Thanks in advance.

  16. Jerry Stevens says:

    I am not sure if a variable is creating an endogeneity bias in a regression. I collected the residuals from the estimated regression and there is no correlation between the potential endogenous variable and the errors. Is this an adequate test?

    • Charles says:

      Jerry,

      This seems like a reasonable approach to me. Having said that, I know that this issue has been studied and other tests such as Hausman’s Test can be used as well as instrumental variables. The following is a paper which maybe useful to you.

      www-2.dc.uba.ar/alio/io/pdf/claio98/paper-12.pdf

      Charles

  17. Rick says:

    Hi Charles,

    If your researching 2 ways of working by comparing 2 factors (say costs and duration) with each other from data of 80+ projects (half being projects done by the new way of working, half done by traditional way), should you use z-test, or always add ANOVA and pearson/spearman to the analysis?
    Thank you in advance!

    • Charles says:

      Rick,
      If you want to take the interaction of cost and duration into account, you should probably use ANOVA. If the interaction is not important then two t tests seems to be a reasonable way to go. In either case, you need to make sure that you satisfy the assumptions for that test.
      Charles

  18. jastine says:

    Assumptions of the following statistic or statistical tool:
    Classify whether parametric or non-parametric.
    • z-test of mean difference
    • t-test of mean difference
    • z-test of correlated means
    • t-test of correlated means
    • Pearson Product-Moment correlation Coefficient
    • Spearman Rank Correlation Coefficient(rho)
    • Chi-square goodness-of-fit
    • Chi-square of Independence
    One Way ANOVA(Analysis of Variance

    • Charles says:

      The first 4 are parametric. The 5th is not a test, but the usual tests are parametric. The next 3 are non-parametric and the last is considered to be parametric.
      Charles

  19. aisyah says:

    hi..i just wanna ask u. Is it right to test for significant difference or (parametric test) in convenience samples?
    thanks in advance 🙂

    • Charles says:

      You can use all the usual statistical tests with convenience sample, but you should be cautious about your conclusions since the nature of the sampling technique introduces all sorts of biases in comparison to random sampling.
      Charles

  20. fatin najihah hashim says:

    hello 🙂
    i am master student from malaysia.
    my advisor asked me to include the assumptions in my thesis.
    can you help me which chapter should i include the assumptions?
    is it under the research methodology or is it under findings?

    thanks in advance 🙂

  21. Pete says:

    Are there any other statistical assumptions to be aware of?

    • Charles says:

      Pete,
      I have listed the principal types of assumptions for statistical tests on the referenced webpage. Not all tests use all these assumptions. Other assumptions are made for certain tests (e.g. sphericity for repeated measures ANOVA and equal covariance for MANOVA). For each test covered in the website you will find a list of assumptions for that test.
      Charles

  22. soniya says:

    what do assumption mean in statistic? what do they provide?

    • Charles says:

      Soniya,
      Many statistical tests give valid results only when certain assumptions are met. E.g. the data must be normally distributed or the variances of the data are equal.
      Charles

  23. Bahram says:

    Hello
    My name Bahram from Iran. now, I am a ph.D student in watershed management in Malaysia.
    about my thesis, my supervisory committee have a question:
    – Explain the reason for using ANOVA, do you the data collected meet parametric statistical assumptions?
    Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *