Author

Charles ZaiontzDr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).

Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at Bolt Beranek and Newman (BBN), one of the most prestigious research institutes in the US, and is widely credited with implementing the Arpanet and playing a leading role in creating the Internet.

Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.

He grew up in New York City and has lived in Indiana, Florida, Oregon, and finally Boston, before moving to Europe 36 years ago where he has lived in London, England and in northern Italy.

He is married to Prof. Caterina Zaiontz, a clinical psychologist and pet therapist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.

487 thoughts on “Author”

  1. Hello Dear Dr. Charles,
    I want to calculate aligned rank transform for my three independent variables and one dependent variable, I want to apply three-way factor analysis, I downloaded ARTool exe, but I get an error code of length less than zero parameter name: length. Can you help me calculate the aligned rank transform value?
    Thanks so much.

    Reply
  2. Hello Dr. Zaiontz,
    I am 51 years old and have a master’s degree in Applied Statistics from 23 years ago!! I never really had a chance to work in a related field but instead, I chose to enter into the world of business. About a year ago I had a chance to go back to my college notes pull my academic books out and find out what statistics is really all about! I have fallen in love with these concepts and the depth and perspective that they add to one’s views.

    I appreciate very much the wealth of knowledge that you share on your site. I am working on a real project that someone posted and it involves fitting a Poisson model to the data. The numbers along the way however turn out to be much larger than Excel formulas can handle. Specifically, the Exp and Fact functions run into issues. What are your best suggestions for such scenarios? Would your software overcome this issue? I’m still in the process of learning R and SQL, would you say analysis in those environments would better handle the big number issue? Thank you so much again! Roya D.

    Reply
    • Hello Roya,
      1. To fit data to a Poisson distribution you need to estimate the lambda parameter. The MLE and Method of Moments estimate in this case is the average of the data elements. You don’t need Exp or Fact for this calculation.
      2. Exp(x) works for x up to about the value x = 709.9. Fact(x) works for x up to 170. You can also use Gamma(x+1) or Exp(Gammaln(x+1)). This points the way for dealing with Fact(x) and Exp(x) for large x. Instead, if possible, deal with LN(x). E.g. the pdf of the Poisson distribution is f(k) = lambda^k * e^(-lambda) / k!. Thus, ln(f(k)) = (k*LN(lambda)-lambda)/GAMMALN(k+1). Note that GAMMALN(x) works for large x up to about 1E+305 and LN(lambda) works for large lambda, including the largest numbers supported by Excel.
      Charles

      Reply
  3. Dr. Zaiontz,

    Thank you again for this amazing resource pack. I have done cluster analysis with it and want to produce a graphical representation of the clusters. Would you happen to have any suggestions for me?

    Thank you.

    Reply
  4. Dear Charles,
    I found your package is very powerful. I want to go deeper to know the details of your self-defined functions such as ForecastError(), ARMA_SSE(). What should I do?

    Reply
  5. Dear Professor,

    I have a quick question. If every value is the same except 1 shouldn’t the Gwetsac2 be greater than .7?

    Likert scale.

    Reply
  6. Dear Charles:
    What test should I use to determine differences between the following kinf of data:
    All independent and assuming a normal distribution of data.

    Age groups Surgery A (n frequency) Surgery B
    0-5 200 100
    6-10 300 200
    11-15 500 300
    16-20 200 150
    21-25 160 100

    I am not sure that Chi square is really capturing differences between surgeries for each age group.
    Thank you!!!!!
    etc

    Reply
    • Hi Daniela,
      I am not sure what the best approach is, but here are my suggestions:
      1. Use the two-sample Kolmogorov-Smirnov or Anderson-Darling tests
      2. Convert the frequency tables into raw data. You could use the midpoints of each of the categories, but this would yield lots of ties. Maybe a better approach is to place the formula =5*RAND() in cell A1, highlight A1:A200, and press Ctrl-D. Then insert the formula =5+5*RAND() in cell A201, highlight A201:A500 (300 cells), and press Ctrl-D. Then insert 10+5*RAND() in cell A501, highlight A501:A1000, and press Ctrl-D. Etc. Use the same approach for the second sample in column B. Now, perform a two independent sample t-test (or Mann-Whitney test if the normality assumption doesn’t hold).
      Charles

      Reply
  7. Hi Prof,

    Good day to you, I have just installed the resouce pack for Mac 2011 but somehow encountered an error when trying to run the Binary Logistic and Profit Regression. I am currently using the M1 Macbook Pro.

    The error message is as follows:
    Compile error in hidden module: ‘LogisticRegression’. This error commonly occurs when code is incompatible with the version, platform, or architecture of this application.

    May I know a workaround for this matter? Thank you!

    Best Regards,
    Aloysius

    Reply
  8. Hi Charles,
    I also have on my computer the Premium Solver 2022

    Just after I have installed real statistic package I get a conflict with Premium Solver 2022
    Error message is:
    Quote
    To guard against this possibility, you should avoid using any defined names beginning with “solver” in your own application.
    UnQuoute
    Please notice I have also installed traditionl Solver but this one had no conflict with Premium Solver 2022. The only newcomer is real statistic.
    I uninstall real statistic according with the instructions but error persits.
    The idea is that I want to keep real statistic package.
    Thank you for your support,
    Marian

    Reply
    • Hi Marian,
      Real Statistics does not have procedures that begin with the word Solver. The only functions which begin the word Solver are the Excel worksheet functions/procedures SolverReset, SolverAdd, SolverOK, SolverSolve, and SolverOptions.
      Charles

      Reply
  9. Hello Charles,

    Good day!
    I need some clarifications. If I want to compare the GWA (General Weighted Average) of students (one group had Online Classes during the pandemic; the other group had Face-to-Face classes before the pandemic); can I use a t-test for independent samples? Take note that the subjects or courses taken by the 2 groups are different. For the Face-to-Face group, their GWA is based on the courses they took during their 1st and year years in college; while for the Online group, their GWA is based on the courses they took during their 3rd and 4th years in college. I do not have the grades in the individual courses, only the GWA is available from the records section.
    Also, if one group is normally distributed and the other one is not, should I use the Mann-Whitney U test?

    Thank you.
    Florence

    Reply
    • Hi Florence,
      Yes, you can compare two independent groups using a t-test provided the samples are normally distributed. If one or both are not normally distributed you woukd generally use the Mann-Whitney U test.
      The fact that the courses taken by the two groups are different needs to be stated in the hypothesis being tested.
      Charles

      Reply
  10. Dear mathematician. I did some research on weighted regression. Really important to our work. I would like to know your studies on this problem in analysis of variance. Thank you and wish you good health. Best regards.

    Reply
  11. Hi Dr. Zaiontz —

    Thanks for putting this amazing product online and share with the public. I really enjoy it. I learned a lot from you just in a few days of reading your materials. Some of them were confusing to me when I was in school but now they become much clear. Thanks a lot!!

    Reply
  12. dear Charles,
    your job is just… amazing!
    ma mi pare di capire che ormai pratichi più l’italiano che l’inglese
    🙂
    hai fatto un lavoro incredibile…
    Complimenti!!!

    Reply
  13. Good morninig, Dr. Zaiontz,
    Fellow Boilermaker here. Your Real Statistics has been beneficial in my Data Analytics class. But I just ran into an issue with the Correlation Test. It ran fine the first time I used it, but now it takes a very long time to produce the results or becomes “Unresponsive.” So if you have time, I can send you the data to see what I am doing wrong.

    Thank you for producing such a fantastic product.

    Reply
    • Hello Frank,
      Thank you for your kind words.
      It is still nice to say that I am a Boilermaker even after all these years.
      Yes, you can email me your data and I will try to figure out what is happening.
      Charles

      Reply
  14. Hi Charles,

    Our study is about using 2 different methods to evaluate and rank the performance of 29 companies. We have already accomplished the rankings of both methods. I thought that Kendall’s W is applicable to my study in knowing if the two methods are in agreement with each other with regards to the specific rankings they provided.

    I have applied Kendall’s W to my study. The two methods being the raters and the 29 companies are the subjects being ranked. These companies are ranked from 1-29, from the best to the least best company. I would like to ask if Kendall’s W is appropriate and suitable for my study.

    Even though I still don’t know if it is applicable to my study I have already tried getting Kendall’s W. The W that I got is 0.572167488 while the P-value is 0.272830274. Last question would be, what could be the interpretation of these results?

    I hope you enlighten me with my concerns and I’m looking forward to hearing from you. Thank you so much! It would really mean a lot to me.

    Reply
    • Hi Christine,
      Based on the information that you have provided it does seem that Kendall’s W is appropriate.
      Since p-value > alpha = .05, which supports the null hypothesis of no agreement, we conclude that it is likely that there isn’t agreement between the two approaches.
      Charles

      Reply
  15. Hi, professor

    I found this website is very helpful. A question about sample size and stepwise. I will survey a group of employees with and without a pet. Participants are asked to self-report if they have pets. So I will only know how many participants have and without pets. My research question is that what are variances accounted for by demographics ( education, gender, age, and income )predicting life satisfaction scores for employees with pets. I use G*Power 3.1, I entered two IV ( demographics and pet), the sample size is 67. Is this correct? Or, should I enter 5 IVs (education gender, age, income, and pet) to calculate sample size? Thank you very much for your help.

    Reply
    • Hi Beverly,
      Most likely, education, gender, age, income, and pet are your IV’s. Keep in mind that if some of these are categorical variables then you actually have more than 5 IVs. E.g. if age is coded as 0-20, 21-40, 41-60, 60+, then this results in 3 IVs (one less than the number of categories). See
      Dummy Variables
      Charles

      Reply
    • Hello Gerardo,
      We are doing well. I hope that the same is true for you.
      I am working on a new release, but it is not ready yet. I expect to complete it this month.
      Charles

      Reply
  16. I just found out about your website. Thanks very much for the work you are doing! I have a degree in statistics (M.S. from the University of Minnesota) and worked for three years as a statistical consultant at the University of Victoria (2005 – 2008). As a consultant, I primarily used R, SPSS, and SAS. The software you have created would have been SO very helpful to me as a consultant and to my clients back then. I am glad that it is here now and can help people to understand statistics better and do quality data analysis. Thanks again, you are working on a truly great idea!!!

    Nicholas Karlson (www.rcoding.org)

    Reply
    • Hello Nicholas,
      Thank you for your very kind remarks.
      I welcome your support and would appreciate any suggestions that you have for how to improve the website and/or software.
      Charles

      Reply
      • Greetings Charles,
        I will indeed be looking for ways to support your work at Real-Statistics. Part of my current job is to help graduate and undergraduate students adopt/use econometrics/statistics software. There are several courses and use-case scenarios that would significantly benefit from Real-Statistics. For example, I think Real-Statistics would be a great help to students studying AP Statistics.
        Kind regards,
        Nicholas

        Reply
      • Hello Charles,
        I make a point to visit your site atleast twice a month & end up learning and using your tool. In fact my learning is more effective by reading through the Basic Concepts on this site than the heavy books on stats! Not sure why your tool is not widely used at US universities and companies. But I did use the tool while doing a Masters program in OR last year and do use some of the MV analyses. Anyway wished to thank you and happy to see we have made through Covid.

        Reply
        • Hello Sutanu,
          Thank you for your very kind remarks. I sure hope we have made it through Covid, but, unfortunately, that may not be true for some people. So far so good for my family.
          Charles

          Reply
        • Dear Everyone: It would be nice to collaborate as a tem using advanced statistical methods.
          How can we get in touch with other interested members and actually help build a community
          I can see how great this tool is after a few moments of reading the reviews.
          Regards
          DMZ163

          Reply
  17. Dear Dr. Zaiontz,

    Thanks to your excellent guidance and plug-in software, I feel confident that I don’t need to learn SPSS or R just to do some statistical tests with my data. I am currently writing a long book on publication and will be recommending your materials in it. Your explanations of the tests on the website are also really useful for someone like me that was never good at math.

    Reply
  18. Hello Dr. Zaiontz,

    Hope you are doing well. I am working on analyzing some data for a research project right now. I was researching online about data analysis techniques that led me to your website. The toolpak which you’ve created is proving to be extremely useful. However, I am wondering if the toolpak statistics options are affected by blanks (non-values) and if I should address them first. Any insight will be greatly appreciated.

    Sincerely,

    Suhas

    Reply
    • It depends on the specific data analysis tool. Some accept blanks (using listwise deletion of missing data). Others will give an error message when there are blanks in the data. This should be stated in the documentation on the website.
      Charles

      Reply
  19. Excellent job. You really make statistics digestible. I feel free now to recommend your site to my students without any fear they give up after the first bunch of math symbols.

    Reply
    • Hello Marian,
      Thank you very much for your words of encouragement. I have tried very hard to walk the thin line between too much mathematics and not enough. While I haven’t always succeeded I trust that I have succeeded just enough.
      Charles

      Reply
  20. Hi Dr Charles,

    I would like to ask is there any ways to include the data collection days into statistical analysis or test? I have collected the height of the plant for 14 days, and how can i include this variable into statistical test? The variable indicates the day i collected my data, like day 1, day 2, day 3, etc.

    Thank you.

    Reply
  21. Hi Charles,

    For Passing-Babok, the equation for c is sQRT(n*(n-1)*(2*n+5)/18)*z-crit. Can I ask if 5 and18 are unchanging or changes when the dataset grows?

    Also, if the 95% confidence interval doesn’t include 0 for intercept and 1 for slope, how do I then correct this difference? Is there a factor that I can use to allow the 2 methods to become comparable?

    Reply
  22. Hi Charles,

    Thank you for creating this website and the software.

    I would like to know if I can apply ICC to test for test-retest reliability for my strain sensor. My strain sensor is attached to a concrete sample, and the sample is put under compressive load. I repeated the tests multiple times keeping everything constant, including the person observing the results (strain value from the sensor and compressive load on the sample). The only difference was the time when the test was conducted.

    Could I apply ICC in this case to measure the reliability of my strain sensor?

    Thank you.

    Reply
  23. Hi Charles,

    In your section “Confidence Interval for one sample Cohen’s d” in the calculation of standard error (se), the first term in the sqrt is given as 1/n. However, Hedges & Olkin (1985, p 86) show the quantity (n1+n2)/n1.n2, which simplifies to 2/n when n1=n2. I’m wondering if the former is a typo, or is there some other simplification for paired samples?

    Thanks very much!

    Reply
    • Hi Michael,
      In the one-sample case, there aren’t n1 and n2 since there aren’t two samples, and so this can’t be the correct formula.
      Actually, the one-sample case is more like a paired sample case where one of the samples has a constant value (usually 0).
      Charles

      Reply
  24. Hello Charles,

    Thank you so much for making this website and software.

    I am trying to run binary logit regression model, is it possible to generate marginal effect by using the software? Please help.

    Thank you.

    Reply
  25. Dr. Charles,

    Thanks for so much helpful tools. May I ask you a question about ADF?

    When I did ADF unit root test for cointegration time series Y and X, I got the OLS equation(Y=a+bX) and the residual series(ERR) .

    I ran ADF unit root test for this residual series(ERR) in level, the P value > 0.2; But if ran ADF unit root test for this residual series(ERR) in 1st difference, the P value <0.00001;

    So can I say time series Y and X is cointegrated?

    Thanks.

    Reply
  26. Dr. Charles,
    I can not thank you enough for your contribution with statistics with excel. I know you have several templets for various statistical analysis. I am wondering if you have any downloadable templet for “Bland-Altman”.
    Thank you,
    Regards
    Mozammel

    Reply
  27. Hello Dr. Zaiontz,

    Thank you for sharing such a broad array of content on this site!

    I am using your sample size calculator for logistic regression (with binary IVs). The example shown (and in the downloads) uses a two-level binary predictor (men vs women) with binary outcome (opioid Rx).

    I realized after trying out the tool, that the resulting sample size is equivalent to sample size calculations used for a test of two proportions (in my world, typical for A/B tests). Makes sense.

    Now, I am hoping to use this for a multivariate experiment with three predictors (each with three categorical levels) and one binary outcome. I already have a fractional
    factorial design that I will be using (3^(3-1)) = 9 variants.

    With more than one IV in such a model (3 in this case), each with three-levels, what adjustments to the sample size calculation would be needed? I’ve always assumed there should be sample size efficiencies when running multivariate tests over “one at a time.” Are any adjustments needed?

    Thank you,

    John

    Reply
    • Raihan,
      Yes, you would need to perform 13 more iterations. The Rasch data analysis tool provided by the free Real Statistics software would do all of this automatically.
      CHarles

      Reply
  28. Dear Charles,
    thank you very much for your good recommendations. They work perfectly. Now, I only have to understand what I did and get.
    Best wishes
    Fritz

    Reply
  29. Dear Charles,
    thank you very much for your support – it was already last year. Time goes by but I still fight with my statistical problems. My present problem is: I do a step-wise regression analysis with Yi = b0+b1X1+b2X2+b3X3. I monitor how R^2 increases from step to step. Now, I recognise, that the R^2 increment depends on the sequence in the second step which is either Yi = b0+b1X1+b2X2 or, alternatively, Yi = b0+b1X1+b3X3. In the next step, everything comes together again. The question behind is, however, whether X2 or X3 correlates better with Y1 identified by the R^2 steps. I wonder whether the reasons are the different correlations between X1 and X2 or X1 and X3, respectively. X1 and X2 do not show a distinct correlation, whereas X1 and X3 do. Maybe, you know a way out. Thank you very much. Fritz

    Reply
  30. Dear Dr Charles
    A very interesting site. I am a Retired biomedical engineer helping a Cardiac surgical team to analyse their heart valve replacement patient data. One of the requirements is a Kaplan meier survival analysis. Since my need is only about once or twice a year, the regular statistical software are too expensive.
    I was hoping to try out your Excel package – but am UNABLE TO DOWNLOAD it for some odd reason. The DOWNLOAD buttons are not working – but only opening a page again. Could you kindly help ? Would like to update my rusty knowlege by checking out your example workbooks also. Thanks in advance and looking forward

    Reply
  31. Hi Charles.

    I couldn´t find the test for homogeneity of variances when more than 2 samples are in the analysis. Did I miss it?

    I want to congratulate you for developing this great tool.

    Reply
  32. Hi Dr. Charles Zaiontz! we are students. Actually we run two way manova in spss now we have to interpret the data. May you help us in interpretation of data.

    Reply
  33. Dear Dr. Zaiontz,

    thank you so much for your extension pack and your excellent work! It helps me a lot at work and in my studies, also, the ressources and explenations you provide are very clear and easy to understand.

    I wish you all the best,
    Diana

    Reply
  34. Dear Dr. Charles Zaiontz,
    I am studying Principal component and Factor analysis and I need to understand how to perform these analyzes step by step, starting with the original data, ‘manually’ without any software. I already understood how to calculate the nxn (n> 5) matrices of correlation, covariance and their inverses, the row echelon form by Gaussian elimination, and the determinant, but I do not understand how to obtain, for example, eigenvalues. Please, can you indicate an article containing all the steps with examples?
    Thank you very much,
    Otávio

    Reply
  35. I am in the process of writing a math paper for IB high school right now and really would like to know the best way to approach the raw data to determine whether or not it follows normal distribution. Just want to know what the procedure is, which tests I could use and how it would be done on real-statistics.com.

    Background information
    The raw data is the number of people into same sized intervals of 100mmr (matchmaking rank). It starts at 1100mmr but it should not matter as they can be put into individual ranks e.g. rank 1 would be people who are in between 1100mmr to 1200mmr rank 2 are people in-between 1200 to 1300 and so on. The raw data will have about 500,000 to 1,000,000 people in it that are separated into those intervals. (if needed i can send the raw data / grouped data)

    If it’s not too much work, I just want to know what would be the procedure and tests needed to figure out whether this data follows normal distribution including showing a final graph of it. Also where I can find the procedure and tests as a guide on how it’s done.

    Your website has helped me tremendously on other projects in the past.
    Thanks,
    Jesper

    Reply
  36. Hi,
    I have a data of daily minimum temperature. From the graph i can understand that it has seasonality pattern. I have found monthly average temperature and plot the graph. Can I use seasonality index and all to forecast. And also can I use temperature data for forecasting?

    Reply
  37. Hey Dr. Zaiontz,

    I used the three axis method to look at whether I can see I have some statistical significance between some behaviour and daily rainfall/temp over a time series. I adopted your method, so my dependant being the date. I chose to look at a spearman and pearson in R. I had temp/rain fall as Y and date as X and then duration of activity as Z. Results show a R=0.68 and p=0.00013. So based on this I can see a reasonably strong correlation between temperature and lying time over the time serious with statistically significant results. Would you agree with this or am I barking up the wrong tree?

    Reply
  38. Dear Charles,

    This is an inquiry about my last message about logistic regression.

    I would appreciate your reply to my earlier message, but even independently of the question about the software, at least I want to know whether anything was wrong with my interaction model.

    Thank you in advance for your kind response.

    Best regards,

    Reply

Leave a Reply to Matt Cancel reply