Box’s M Test Basic Concepts

Box’s test is used to determine whether two or more covariance matrices are equal. Bartlett’s test for homogeneity of variance presented in Homogeneity of Variances is derived from Box’s test. One caution: Box’s test is sensitive to departures from normality. If the samples come from non-normal distributions, then Box’s test may simply be testing for non-normality.

Suppose that we have m independent populations and we want to test the null hypothesis that the population covariance matrices are all equal, i.e.

         H0Σ1 = Σ2 =⋯= Σm

Now suppose that S1, …, Sm are sample covariance matrices from the m populations where each Sj is based on nj independent observations each consisting of k × 1 column vector (or alternatively a 1 × k row vector).

Now define S as the pooled covariance matrix


where n = \sum\nolimits_{j=1}^m n_j define the following:




The null hypothesis (of equal covariance matrices) is rejected when M(1 – c ) > χ2-crit (or p-value < α).

This estimate works pretty well provided nj > 20, m ≤ 5 and k ≤ 5. A better estimate can be obtained using the F distribution by defining the following:





If c2 > c2 define F = F+, while if c2 < c2 define F = F. Then F ~ F(df, df2). The null hypothesis is rejected if Fcrit.

Observation: If any of the Sj is not invertible then |Sj| = 0, and so ln|Sj| will be undefined. Thus M will be undefined and the test will fail.

Example 1: Determine whether the covariance matrices for Young, Middle and Old are equal in Example 1 of ANOVA with Repeated Measures with One Between Subjects Factor and One Within Subjects Factor.

Pooled covariance matrix Excel

Figure 1 – Covariance matrices for Example 1

The sample covariance matrices for Young, Middle and Old are calculated (see Figure 1) using the COV supplemental array function from the data in Figure 1 of ANOVA with Repeated Measures with One Between Subjects Factor and One Within Subjects Factor. Since the nj (j = 1, 2, 3) are all equal, the pooled covariance is simply the average of the Young, Middle and Old covariance matrices.

The calculations required for Box’s test are given in Figure 2.

Box's M test Excel

Figure 2 – Box’s test for Example 1

m = number of matrices = 3 (Young, Middle, Old), k = the size of each covariance matrix = 5 (each matrix is 5 × 5), n1 = n2 = n3 = number of subjects in each sample = 7 and so n = n1 + n2 + n3 = 21. In columns Q, R, S and V, nn = n1 – 1 = 6 for Young, nn = n2 – 1 = 6 for Middle, nn = n3 – 1 = 6 for Old and nn = n – m = 18 for Pooled. The other entries are as described above.

Generally we use a significant level of α = .001 for this test. From Figure 2 we see that M = 34.81 and both the chi-square test and the F test are not significant. We therefore have no reason to reject the null hypothesis that the three covariance matrices are equal.

35 Responses to Box’s M Test Basic Concepts

  1. Thomas Henry says:

    I want to examine the relationship/effect of a 2-group qualitative IV, a numeric covariate, and their interaction on 7 numeric DVs. If I do seven hierarchical regressions or seven ANCOVAs, I get a combination of significant main effects and interactions for all of them. However, if I run a MANCOVA with all 7 DVs, the Box’s test comes back at .002. According to this page, that is problematic. I have seen other pages where only p < .001 is considered problematic. What should I do in this situation? I have seen some suggestions to use the Pillai's trace result. Is this correct? How can I move forward with this situation? Thank you in advance.

    • Charles says:

      Sorry, but I would have to understand your situation in more detail to provide a credible answer. Unfortunately, I don’t have the time to delve more deeply into your situation. Perhaps someone else from the community can provide some insight.

  2. Rebecca Cockeram says:

    Hiya, I have just conducted a Multivariate General Linear model test and in the Box Test the significance has come back at .000, i am just wondering does this mean that the finding is significant or not. I am very new to Spss (started using it yesterday and have been told to do my whole dissertation on it. Please can someone help me.


    • Charles says:

      For the Box test, the usual significance level is alpha = .01 (whereas for most tests alpha = .05 is customarily used). Thus, if the p-value that you get from the test is .000 (i.e. zero to three significant digits), then clearly .000 < .01, and so you have a significant result. Charles

  3. Amber van der Wal says:

    Dear Charles,
    For my research project I want to examine whether adolescents (high versus low trait aggression and boys versus girls) differ in their preferences for different types of violent media content.
    I have the following dependent variables (gathered through a content analysis of adolescents’ favorite television programs):
    – overall aggression
    – 3 subtypes of aggression (physical, verbal and indirect aggression)
    – aggression in different contexts (graphic, realistic, humorous, rewarded, punished)

    My data is not normally distributed. Therefore, I have considered testing non-parametrically. However, I would have to conduct so many separate Mann-Whitney tests I believe the loss of power would be insurmountable. So, I have returned to my original plan of testing my hypotheses with one two-way ANOVA (with overall aggression as the dependent variable and sex and trait aggression level as between-subjects factors) and two MANOVA’s (one with the three subtypes of aggression and one with all the context variables, both with sex and trait aggression level as between-subjects factors). Does this make sense?

    If so, do I understand correctly that if the Levene’s test is not significant for all dependent variables the (M)ANOVA is robust enough to test the non normally distributed data with? Do I have to test the Levene’s test separately for the two factors? And what if Levene’s test is significant for some dependent variables? I have now tested it with both factors simultaneously (I don’t whether that is correct) but then three of my context variables are significant.

    Thank you very much in advance.

    Kind regards,

  4. shiran por says:

    Hello Charles, Thank you for all the information!
    I have conducted a Manova of 10 dependent variables between 2 groups. each group n=50.
    Box’s M: F(55,31014)=2.175, p=0.000
    What does it mean? I understand that the covariance is unequal. Does this invalidate the subsequent analysis?

    Leven’s test of equality of error variances showed significance for 4 of the dependent variables.
    do you have any recommendations/suggestions for me?
    references for articles that can help?

    Thank you for your time and help!

    • Charles says:

      The significant result for Box’s M test indicates that either the normality assumption fails of the equal covariance assumption fails. This could invalidate MANOVA. If the sample sizes across the groups are the same, then this is less of a problem and you should be able to use MANOVA anyway. I would then use the Pillai Trace.

      When you say “between 2 groups” do you mean that you have two independent variables? If so you can use Hotelling’s T-square test, which is a special case of MANOVA. In this case, there is a version of the test when the covariance matrices are unequal. See the following webpage:

      With so many dependent variables and so small a sample size, I would be concerned about the power of the test.


      • shiran por says:

        thank you very much for your answer!

        I meant 1 independent variable (culture) with 2 groups (usa/russia).
        is there anything else i can do for strengthening the results?

        thank again!

        • Charles says:

          Essentially this just means that you have dependent variables for USA and the same dependent variables for Russia (sic). It sounds like I am missing something. What are you trying to test?

          • shiran por says:

            I hope i can explain myself..
            i’m looking for the differences between usa/russia participants regarding several statements. i’m assuming that russia will have a lower mean across all statements.

  5. Sirinna says:

    I have received a warning message:
    Box’s Test of Equality of Covariance Matrices is not computed because there are fewer than two nonsingular cell covariance matrices.
    What wrong and how can i deal with it ?

    • Charles says:

      This sounds like an error message from SPSS. I am not familiar with the error messages from SPSS, SAS, etc. The website focuses on Excel.

  6. LanaLo says:

    Hello. I am very sorry but I don’t really understand a lot. I am doing Mixed-design ANOVA and my normality assumption is not met. And I have Levene’s test: p = 0,71 which means there is homogeneity of variances, right?
    Do I have to check this Box’s test? SPSS automaticlly calculated it and p = 0,013. What does that mean? Is this data important for me?

    Thank you very much!

    • Charles says:

      Yes, your interpretations of Levene’s and Box’s tests are correct. If you are conducting an ANOVA you really don’t need to use Box’s test; Levene’s test should be sufficient. Also note that Box’s test is very sensitive to normality; since your data is not normal it is not surprising that you can get a a significant Box’s test even though the homogeneity of variance is actually met.

  7. Danish says:

    Hello Charles,
    I am conducting a research on advertising effectiveness, when studying gender interaction on the dependent variables (during MANOVA), i found that my box’s m test was significant. i have checked the normality of my data and found it normal. also, i have large and unequal sample sizes, so is it allowable in my case to proceed with manova even if box’s m was significant? further, i have also read that pillai’s criterion should be used if box”s m is significant instead of wiki’s but i was was unable to find any reference. please advice me on the matter and if possible provide me with a reference.

    • Charles says:


      You can proceed with MANOVA, but it is important to report the reservations you have based on Box’s M test.

      Regarding which criteria to use, see Field, A. (2009) Discovering statistics using SPSS. 3rd Ed. SAGE. Other sources are
      Olson, C.L. (1976) On choosing a test statistic in multivariate analysis of variance. Psychological Bulletin, 83, 579-586
      Olson, C.L. (1979) Practical considerations in choosing a MANOVA test statistic. Psychological Bulletin, 86, 1350-1352
      Steven, J.P. (1980) Power of the multivariate analysis of variance tests. Psychological Bulletin, 88, 728-737.


  8. Artek says:

    Hello Charles,

    I have a design where there is one intergroup factor (2 levels) and one repeated measures factor (3 measures). I used to think, that for assumptions you must check both sphericity (Mauchley’s) and homogeneity of variances (Levene’s test). However, I was recently told that I could use M Box test for this. Which approach do you recommend?

    Also – having read above that M Box is sensitive to non-normal data, can one try M Box, and if significant, check sphericity + homogeneity instead?

    What about checking correlation of means*st.devs only – seems crude, is it used?

    • Charles says:

      hello Artek,
      Since Box’s M test is sensitive to non-normality, I tend not to use it. If you see that the data is normal, then I guess it is ok to use.
      Usually I check for homogeneity of variances. I do this by comparing the sample variances (or using Levene’s test if I have any doubt). I then rely on the GG and HF sphericity correction factors. I have never tried to use the method you describe in your last paragraph.

  9. Barry Williams says:

    I have conducted Shapiro-Wilk normality tests (passed) on data then used in a 2-Way Anova. I am aware that the study is underpowered, with only 7 participants each of three training groups. Box’s Test is contravened (p<0.05) with a significance of 0.034. Does this invalidate the subsequent results/analysis? Or should I be using p<0.01 for this test. If the latter should I not be using a similar p value for other tests (e.g. Levene's)?


    • Charles says:

      If you are performing 2-way Anova, I would use Levene’s test (instead of Box’s test) to check homogeneity of variances. In any case, the Box’s Test result is close to .05 and so is not too concerning.

      • Hi, could you please help, im doing my dissertation and ive performed a 2 factor manova but my box test is showing .000 sig? what shall i do? should i no longer use this test. Im trying to test the difference between divorced/intact families and the duration of both martial status.

        • Charles says:


          Make sure that the problem isn’t due to outliers. You should check for univariate outliers and multivariate outliers (using Mahalanobis distances). If so, you could rerun the test without these outliers. In this case you should still report the existence of the outliers.

          Box’s M test is quite sensitive to violations of normality. If you believe the test is giving a significant result because the data is not multivariate normally distributed, then results from MANOVA might still be valid (although it will likely be difficult to determine whether the problem is normality of homogeneity of covariance matrices).

          Don’t use MANOVA, but use separate ANOVA tests instead. Not ideal, but it might be the best you can do.


  10. Julie Bendix says:

    Hi Charles
    What does it mean when the Box’s M test is violated? Can you then perform Discriminant analysis anyways and just be aware of the it when you interpret your results?

    • Charles says:


      Violation of this test could mean that any results obtained from the test that depends in the assumption of equal covariance matrices is invalid. It really depends on by how much the covariance matrices are different.

      A particular problem with Box’s M test is that a significant result may be due to a violation of normality and not equal covariance matrices. The specific test that you are using may be pretty robust to violations of normality but not heterogeneous covariance matrices (in fact this is usually the case).


  11. fatima says:

    Hi Charles

    I don’t know how I can interpret Box’s test….. when it is significant, does it mean that our groups are not homogeneous? am I right?

    • Charles says:

      Yes, you are correct. A significant result means that we reject the null hypothesis that the groups have equal covariance matrices.

  12. Tim says:

    Does this work for scalar observations (i.e. k = 1)?

    Also, if I want to use a similar test to check for sameness of several regression coefficients, could I do it? Here’s what I want to do:

    I have several models: “y = alpha + beta*x”, each based on n scalar observations. I want to use the above test, plugging in (k = 1) and replacing your S_j with alpha_j or beta_j (separate tests). Would that allow me to test the null hypothesis that all the true alpha’s or beta’s are the same?

    • Charles says:

      When there is only one independent variable Box’s M test is known as Bartlett’s test.
      I have no reason to believe that you can use this test that the alpha or beta values are the same.

  13. Dam says:

    this is nice but what is the relationship b/n sample proportion Box’s test!

  14. Liz says:

    Hi there, I have received an error message:
    Box’s test of equality of covariance is not computed because there are less than two nonempty cells.
    Levene’s test of equality of error variances is not computed because there are less than two nonempty cells.
    Can someone explain what this means, and what I should do please?

    • Charles says:

      It just means that you have too little data for these tests to be used. If you believe that you do indeed have more than two data elements then this probably means that you haven’t entered the arguments to these functions correctly.

  15. Todd says:

    In the “F-minus” equation, should the denominator read “(a-minus – df2)” instead of “(a-minus – M)”? The former agrees with the results in Fig.2 and the latter does not.

    • Charles says:

      It looks like the mistake is in Fig 2. I will fix the website and issue new versions of the software and the multivariate examples workbook.
      Thanks for catching this error.

Leave a Reply

Your email address will not be published. Required fields are marked *