Author

Charles ZaiontzDr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).

Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at BBN, one of the most prestigious research institutes in the US, and widely credited with implementing the Arpanet and playing a leading role in creating the Internet.

Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.

He grew up in New York City and lived in Indiana, Florida, Oregon and finally Boston, before moving to Europe 29 years ago where he has lived in London, England and northern Italy.

He is married to Prof. Caterina Zaiontz, a clinical psychologist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.

178 Responses to Author

  1. Assad Kafe says:

    Dear Dr Charles
    Our college uses four version of each exam in order to limit cheating between students, but the problem when diving the students into four group the reliability and difficulty of the hole exam will be affect. My question, Is there any method rejoined the four versions into a single one?

    • Charles says:

      Assad,
      I see the difficulty that you are trying to address, but I don’t understand your question. Are you trying to find a reliability index for all four exams together?
      Charles

  2. Deval says:

    Hello,
    Is it possible to add something to the real-statistics package?
    I have been performing several Tukey HSD/Kramer Test and would like to make it less tedious. It would be helpful if the program performed all the comparisons (ie: 1 and -1) and listed them on the bottom, including the groups that were compared, and also highlighting the significant ones. It can get annoying to perform 10 comparisons when I have 5 groups.

    Thank you.

  3. Vitor Mauad says:

    Hey Dr Zaiontz, first and foremost thx for your help on my last post. I’ve come across another issue i would like some help with if you can spare me some of your time.

    I’m doing a paper on the circadian cicle and season on acute myocardial infarction and other acute coronarian syndromes.

    Lets say we had out of 60 patients with acute coronatian syndromes, 23 on sumer, and 31 on the time frame in between 0h and 6am, how should i test the estatistical significance and, if possile, relative risk in this senario? How to acount for exposure time diferences between for instance those who had an episode in summer and those who had it in other seasons, considering they had 3 times the amount of exposure time?

    Once again, thx in advance

    • Charles says:

      Vitor,
      I only partially understand the scenario you are describing (e.g. (1) 31 + 23 doesn’t add up to 60 and (2) why are you comparing times like summer with times like 0h to 6am?)
      In any case, for most tests you will need to make sure that the dependent variable values are comparable, and so three times the exposure needs to be taken into account. If whatever you are measuring is uniform in time, then you might be able to simply divide these values by exposure time.
      In any case, these are just ideas. I would need to understand your scenario better before giving any definitive advice.
      Charles

      • Vitor Mauad says:

        Yes, so sry about It. Lets see, the first senario is as follow, 54 patients with acute episode, 17 in Winters, 15 Summer and 11 in both othe seasons. How to measure the RR ABS IC95% for the Winter ? Should i pair It with each season or relate It with the other Seasons average?

  4. ALI ALI says:

    Hi Charles

    can you educate how to convert data for log transformation in the following case
    1. negative data
    2. Proportions data
    3. percentages data

    • Charles says:

      1. Let a be the value with the small value (i.e. the most negative value). Then use the transformation log(x-a+1) since x-a+1 > 0 for any x
      2. You should be able use log(x)
      3. Same as #3
      Charles

  5. Hector A. Quevedo says:

    Dear folks:

    I am a university professor. Have a passion for statistics with experimental design orientation. I am writing a book of experimental design applied to environmental engineering. However, in the section of Time series analysis I have some applications of temporal autocorrelation using the Durbin-Watson tables. I would like to include the D-W tables in my book for publication purposes, but I need your kindly permission to do so. Could you help me with this issue?
    Thanks

    Hector A. Quevedo (Ph.D.)

    P.D. I am a graduate from the University of Oklahoma. I am an American Citizen living in El Paso, Texas. I am working across the border.

    • Charles says:

      Hector,
      I don’t have any problem with you using the Durbin-Watson table on my website, but I have copied the values from the table that I have found in a number of other places on the web.
      Charles

  6. MrPink says:

    Hello, I’d like to ask a beginner’s question about multiple regression – I’d be incredibly grateful for your time. I’ve only recently learned the basics of linear regression and I still have the following nagging doubt.

    I’d like to analyse some sales data for the purpose of forecasting future performance. My dependent variable (Y) is ‘profit/loss’, which simply represents a sales figure for individual retail items. This is the variable I would like to forecast; (there are certain quantifiable conditions for each attempted sale of an item and these are my independent variables). My question stems from the fact that the historical values I have for Y are either a positive number (ranging from 0 to 1000) or a FIXED negative value of -100; (an item may be sold for any amount of profit but the wholesale price to the seller of each item is the same, hence the same fixed loss amount for any unsold items). A sample of the data for Y might look like this (note the fixed negative value of -100 in a few instances):

    23
    55
    201
    -100
    13
    -100
    321
    124
    57
    -100
    33

    It’s my understanding that a multiple regression model here would produce varying negative (and positive) values for Y, and this is not my issue. What I’d like to know is, are there any other implications of using this sort of input in a regression model? Or can it be treated in the same way as any ratio type data? Perhaps it sounds silly but I’m wondering whether the fixed negative values might somehow pose a problem. I’m not trying to replicate the fixed -100 value for the losses, only trying to get to true averages so that I may accurately predict the profitability of an item’s listing for sale (and avoid unprofitable listings). Hope this all makes sense. Thank you very much.

    • Charles says:

      I don’t see any problems with this fixed negative amount as long as it truly represents the Y value.
      The real problem you have is that the data may not fit a linear regression model. You should graph the data and see if it is truly linear. You should also graph the residuals and make sure that they are randomly distributed.
      Charles

      • MrPink says:

        Charles, thank you very much for your reply. I’ll be sure to check that the data meets the various requirements of a linear model.

        Regarding your last point, the logic is that the residuals would be randomly distributed because the relationships between the variables remain constant, regardless of the profitability of a given listing. I will of course check the graphs, but it would help to know that I have the theory straight.

        May I ask then, can I take it that if that an AVERAGE for Y in my case can be viewed in the same way as that for any appropriate independent variable, that duplicate/fixed values (in themselves) do not pose a problem in a linear regression analysis?

        To clarify, let’s say there were no fixed loss amounts for Y in another case, that they were free to fall anywhere on the same continuous scale (as you usually find in any textbook example). Let’s also say that the average for Y in both cases is equal. Is it then safe to say that there is no apparent cause for concern with the data I have (assuming that it is appropriate for a linear regression analysis in every other way)?

        Sorry if my limitations here are making things unclear or unnecessarily complicated! Thank you. Ben

        • Charles says:

          I don’t see any particular problems with duplicate data provided the assumptions for regression are met and the negative value can be compared with the other values and is not a conventional value (like coding all missing data as -99).
          Charles

          • MrPink says:

            Got it, thank you so much Charles. Congrats on the excellent resource pack and website here – well done! I hope you are well repaid for sharing your expertise. Ben

          • Charles says:

            Great. Glad I could help.
            Charles

  7. sanaullah says:

    Dear Dr.Charles
    sir i want to you tell may some gaidence about how to writte research paper . pleas sir i have no any good teacher in provience blochitan in country pakistan.

  8. Sneh Saini says:

    Dear Dr.Zaintoz,

    Please guide to interprete Tau observed and critical values of Augumented Dickey fuller test for stock market prediction

    The variable considered is open

    Thanks
    Sneh Saini

  9. Iro O. Ada says:

    Hi, Dr. Charles Zaiontz,
    Thanks for your excellent site and relentless effort.
    I am working on a research using 4-point likert. I decided to use Chi Square to test the null hypothesis. What kind of test can i use for reliability? is Chi square enough?

  10. Mushtaq AL Mohammed says:

    Dear Dr.Zaintoz,

    I want to conduct experiments which will be done by human subjects. The outcome is explained by 10 user predictors and 6 task predictors.I want to calculate the sample size of users and how many tasks that each user should do to achieve a specific power .

    Thanks in advance
    Mushtaq

    • Charles says:

      Mushtaq,
      If you are planning to use multiple linear regression to do this, then I suggest that you look at the following webpage
      Sample Size for Regression
      Charles

      • Mushtaq AL Mohammed says:

        Thank you Dr. Zaintoz for replying.
        Does this give sample size of users and tasks separately? because the output variable depends on two sets of predictors: one from users and other from tasks.
        Thanks,
        Mushtaq

        • Charles says:

          Sorry, but I don’t understand your question.
          Charles

          • Mushtaq AL Mohammed says:

            Dear Dr. Charles,

            I mean, the sample size represents the overall observations that I have to get to achieve the requirements . But this number is a combination of number of tasks and users. For example, if sample size is 200, then I have 20 users by 10 tasks, 10 tasks by 20 users , or 40 users by 5 tasks and so on. This is the case if I want each user does the same tasks that done by other users .But, if each user does a different task, I think the sample size =number of users =number of tasks.
            Thanks,
            Mushtaq

  11. Hanna Fritzson says:

    Hello Charles!
    I am a student at Uppsala University in Sweden, and I have unfortunately not done any statistics during my four years there, which I regret now when I’m doing my master thesis (earth science).
    I have been trying to understand Time series analysis and PCA but it seems extremely complicated. I was just wondering if you could help me answer a basic question about it?
    I have data of pore pressures at three different depths (in the ground), measured two times per day for about 5 years. The problem is that sometimes the measuring device stopped working so there are a lot of missing data, sometimes days, sometimes months. So my question is if it is even possible to make a time series analysis or a PCA with this kind of data?
    Kind regards, Hanna Fritzson

  12. Maxwell says:

    Hi Dr. Raju,

    Thank you for making this helpful Excel-Stats tool available for the public free of charge!

    The Max-Diff analysis is becoming a popular analytical method for consumers’ preference (see “https://datagame.io/maxdiff-excel-modeling-template/”). Would you consider adding the Max-Diff analysis module to the current Real Statistics Resource Pack (release 4.9, as of 9/17/2016)? Thanks in advance for your consideration (or advice if such an analytical tool is available/accessible free of charge elsewhere).

    Thanks,
    Max

    • Charles says:

      Hi Max,
      I am about the issue the next release of the Real Statistics Resource Pack, but I will consider Max-Diff analysis for a future release.
      Charles

  13. 조아라 says:

    Hi Charles,

    Thank you so much for sharing the use of excel instead of SPSS! This is exactly what I was looking for!!!! I am very glad to find your website.

    I am actually in the middle of completing a dissertation and struggling with analyzing the results I got from an online survey software which is very good and easy. However, I wanted to confirm if I am OK to use only excel for analyzing to complete the dissertation. Also if I should prove the reliability on each results I got from the survey.

    Please let me know if you need more information to answer my questions.
    I am very happy to have a conversation with you via email if you are free.

    Thank you so much in advance!

    • Charles says:

      Of course, my objective is to provide analysis capabilities in Excel that are accurate and just as good as those provided by tools such as SPSS. You should be aware that there is a bias against using Excel for statistical analysis. Some of this is based on some errors in earlier version of Excel and the lack of many of the mostly commonly used tools. Microsoft has corrected these errors in the last few releases of Excel and I have worked hard to add capabilities that are assessible from Excel but are missing from standard Excel.
      Charles

  14. Jehvee C. Soriano says:

    Hi Sir Charles!
    Thank you so much for the information about Kappa stat. It helped me a lot in my Research.
    I’m actually conducting a research about the Effectiveness of a Diagnostic kit (local) compared to the commercial one using 60 samples. The commercial one is my Golden standard. And my adviser told me to use Kappa. However, I’m still in the process of absorbing the information/formula. Do you have any simpler formula for my problem?
    I’m sorry for my demand but thank you in advance Charles! God bless!

  15. Vitor Augusto Queiroz Mauad says:

    Hey Charles, I’ve come across another problem and was hopping you may be able to help me. I have a group of patients with an expected bleeding risk of 5, 48% in 3 months, and a thrombotic risk of around 10% over the same amount of time, those calculations came from big studies with over 3000 patients. Anyway, how can I compare those so see if this difference justify or not the use of an anticoagulant agent? Would Odds or Hazard ratio be useful in the scenario? To compare chances of 2 different events over the same sample? Thx in advance

    • Charles says:

      Vitor,
      It sounds like survival analysis (hazard ratio) might be the way to go. See the following webpage
      Survival Analysis
      Of course, you need to determine what sort of ratio is acceptable.
      Charles

  16. Leonardo Barbieri says:

    Dear Charles,

    First, I would like to congratulate you for your website.
    I wonder if you have examples of the sequential version of the statistical test of Mann-Kendall which detects change points in time series.

    • Charles says:

      Dear Leonardo,
      Thanks for your kind remarks about the website.
      Unfortunately, I don’t yet support the Mann-Kendall test.
      Charles

  17. tom holder says:

    On another matter, I have an excel spreadsheet of 2600 university donor prospects with 20 potential predictor variables and am trying to predict those that have a higher likelihood of giving a gift of $10K+. I have experimented with the logistic regression tool and found it very difficult to use and interpret. I’m wondering if a simpler and equally effective solution would be to simply group the prospect list into groups representing all possible configurations of the predictor variables ( permutations?), compute the average number of $10K gifts given by each group historically, then rank the groups from highest to lowest. Those groups with the highest number of $10K+ gifts would receive priority in future fund raising. Does this make sense?

    • Charles says:

      Tom,
      It would seem that there would be a very high number of permutations of the 20 predictor variables. Even if each predictor variable was binary you would have 2^20 possibilities, which is a number larger than one million. With only 2600 donors most of the combinations would not have any representation. I can’t say that a logistic regression model would work any better.
      Charles

  18. tom holder says:

    Charles – is there literature available on your website explaining/interpreting the output of the logistic regression tool?

  19. Marcel Burkard says:

    Dear Charles,

    I downloaded your tool today. I came across your website by accident, but I’m really glad I did. This is already proving to be of great help. I’m currently working on a master thesis about the relationship between prosperity and residential solid waste generation. Your website will be duly cited, of course. Thank you very much indeed for sharing your know-how. This is what I remember the internet used to be about!

    Best wishes,
    Marcel

  20. Uday Smith says:

    Hello Dr. Zaintoz,

    I have question performing F test in excel. It is suggested that one needs to sure the variance of variable 1 is higher so the numerator value is higher when calculating F. But even if you don’t make sure the variance of variable one is higher the p value remains the same on excel. So why worry about that part? with a confidence of 95% alpha =0.05, if p is greater than 0.05 you accept the null hypothesis. We care about the p value which doesn’t change with variable 1 higher or lower. right?
    Thanks for your time

    • Charles says:

      Hello Uday,
      =FDIST(x/y,df1,df2) yields the same answer as =1-FDIST(y/x,df2,df1)
      Thus, you don’t have to make sure that the variance of variable 1 is higher, but in that case you will need to make the adjustment to the test as described by the above equality.
      Charles

  21. Carlos Ramos Cansigno says:

    Hi, Dr. Charles

    Actually i need development a method for detectec periodicity in a data series, ¿is there any way to do this method completly in excel?

    Regards

  22. Hina says:

    dear sir, can I have your email ID? I want to email you my research proposal, can u please suggest an econometric technique that I can apply to quantify results.

  23. Zoran says:

    Charles,

    Thank you very much for this great learning resource. I am very, very grateful.

    Zoran

  24. Pingback: Amazing Excel Stats resource | vunderbolt

  25. Kevin says:

    Dear Charles,
    Thank you for some tips on statistics as I want to prepare a paper for my exam using Excel. However I am buffered with spelling of your surname spelling. I assume on the base how sounds your surname, you are of polish origin. Perhaps you have some relatives called Zając. Once again thank you for your help.

  26. Juan says:

    Hey Charles,

    I was trying to automate a binary logistics regression analysis on the RealStat add-in using a VBA, is there a way that I could do that? So I can just press a macro button to do the binary logistics regression.

  27. Himanshu says:

    Really good! I have one question, you people may find simple to answer. What should be the maximum value/level/cutoff of within group variance to conduct an ANOVA or depends on ANOVA result? I dont mind whether mean are differing significantly or not. I know there are many tests to know suitability of data to perform an ANOVA, but its does not satisfy my need. I want know, what should be maximum within group variance to understand that sampling was correct. since many a time high withing group varince reduce the F-ratio and make group mean difference insiginficant, despite of high variation between group.

  28. Jack Homareau says:

    Hi Charles,

    Great website! A wonderful resource for those looking to deepen and broaden their statistical knowledge.

    One question: I downloaded the Excel add-in and when I tried to run a logistic regression model I got the following message:

    “Compile error in hidden module: Logistic Regression. This error commonly occurs when code is incompatible with the version, platform or architecture of this application. Click help for information on how to correct this error.”

    How to fix this error?

    Thanks….

    • Charles says:

      Jack,
      What version of Excel are you using?
      Charles

      • Mark R says:

        Hey Charles, am also getting the same error as Jack. Am using Excel 2013

        • Charles says:

          Mark,
          The usual reason is that the Real Statistics addin has not been installed properly. A common problem is that Solver has not been installed prior to the installation of the Real Statistics addin. You can determine whether Solver was installed by pressing Alt-TI and seeing whether Solver is on the list of addins with a checkmark next to it.
          Charles

  29. Anna says:

    Please help me! I have to make a homework, multiple analysis regression but i don”t know what to take like depended and indepded variable( i know whAt they are but i want example, conctret example) than i will try to find statistics for time series for minimum 30 years,,, please help meee

    • Charles says:

      Anna,

      I have a lot of information on the website about multiple linear regression. Please look at the following webpage for the list of topics about this subject. Click on any of the topic that is relevant to you. There are examples that you can use.

      Multiple Linear Regression

      If your data is time series data, then you also need to worry about autocorrelation. This topic is covered on the website, but other topics about time series data will be added shortly.

      Charles

      • Anna says:

        Thank you! But i want for example i take like depended variable loans with problems in united kingdom= gdp rate , inflation rate, unemployment, interest of loans rate, exhange paund- us $ ,,, i found some statistics but for the loans with problems i found only for some years not for 20 or 30 years ,,, i dont know if you u derstend me ,,, can you help me?

        • Charles says:

          Anna,
          I understand you and am happy to answer specific questions, but I don’t have a specific example of the type you describe. You can find these types of examples in references on time series or statistics for economics.
          Charles

  30. carlo says:

    hello l’m carlo mallari, how can i know the computation for the least significant studentized range distribution table of Duncan’s multiple range test? its formula?

    • Charles says:

      Carlo,
      The Real Statistics Resource Pack doesn’t support Duncan’s multiple range test, since I use Tukey’s HSD test instead. The resource pack does calculate the studentized range distribution as well as the critical values from that distribution. With this information, you should be able to carry out Duncan’s multiple range test,
      Charles

  31. Dave T says:

    Dr. Zaiontz,
    The Paired Sample Hotelling T2 example is very similar to something a friend is working on. When she showed me her data (“scores” like data), I was hesitant to go the Hotelling / paired-T route as I was uncomfortable about the normal assumption for the data since it was more ordinal in nature. If you were to do that example in a non-parametric analysis, would you do Wilcoxon signed rank with Bonferroni like control for Type I or do you have a better suggestion?
    Thank you!

    • Charles says:

      The Wilcoxon signed-ranks test is a commonly used non-parametric test in case the assumptions for the t test don’t hold. Hotelling’s T-square test is the extension of the t test when there are multiple dependent variables. I don’t know of any non-parametric versions of this test, although I came across the following article which may be suitable:
      www3.stat.sinica.edu.tw/statistica/oldpdf/A8n310.pdf
      Charles

  32. Dear Charles:

    I have been using the realstats to calculate Principal Component Analysis, but there is an issue. I use 4 variables to do the covariance matrix and then I obtained the coefficients for these variables with the evectors function. However if I changed the order of the variables in the covariance matrix and put one of them in the first place, I obtained the same coefficients as before but with different sign. why happens that? Am I doing something wrong? or there is a problem with the variables? thank you for your answer

    • Charles says:

      Miguel,
      If I understand what you are saying correctly, this is completely normal. You aren’t doing anything wrong and there isn’t any problem with the variables.
      Charles

      • Thank for your answer Charles, but I´d better explain with numbers my doubt:
        I used variables A, B, C and D to do the PCA. And I put them in the order: “ABCD” in the covariance matrix function and then in the evectors function.The coefficients for the first component were A: 0.5, B: 0.4 , C: 0.3, D: -0.2.

        Then I changed the order of the variables: “DABC” in the covariance matrix and evectors functions and the coefficients for the first component were A: -0.5, B:-0.4, C:-0.3, D: 0.2. Therefore, the values for the first component are not the same for the two arrangements even tough are the same variables A, B,C and D and now I want to graph the first component but I don’t know which are the correct ones.

        Hope I clarified my doubt and I´d appreciate your comments about it, thank you

        • Charles says:

          Miguel,
          If A is an eigenvector corresponding to eigenvalue c, then so is -A. Thus there is no problem when the signs get reversed. This is normal and won’t effect your final result. If, for example you have a column with say 7 negative values and 2 positive values, you can simply flip the signs so that you have more positive value if this makes the real-world interpretation better. I believe that I did this in one of the examples I gave (although perhaps I did this in an earlier version of the website).
          Charles

  33. Dinalva Oliveira says:

    Dear Charles,

    May you can help me.I am doing a reserch in UVA\UVB treatments.
    I have done an taste test:
    18 peoples tasted salad from diferent treatments and asnwred the questions.
    Which salada is best taste?
    Which salade is most bitter taste?
    They dont give me score only replay
    How can I organise the data for a anova analysis.

    Salad 1 = control salad 2 = UVA treatment salade3 = UVA\UVB treatment

    Thanks in advance

    Dinalva

    I am from Brasil but my studies is at NMBU university in Norway

    • Charles says:

      This depends on a couple of things:
      1. Are the same 18 people answering both questions? (and are these the only questions?)
      2. What exactly are you testing? (i.e. what is your null hypothesis?)
      Charles

  34. Ramin says:

    Dear Dr. Charles;

    Hi, Hope you to begin a happy new year full of health and Joy. I’m Ramin and I’m from Iran. May I draw your attention to building coefficient of orthogonal contrast. If you Have only 3 means to compare you can easily have:

    contrast C1 C2 C3
    ——————————-
    1 vs 2 1 -1 0
    1 vs 3 1 0 -1
    2 vs 3 0 1 -1
    ——————————-
    now consider that you have (for instance) 50 means to compare. writing of a dimension of 50*50 matrix of orthogonal coefficient(as we see in example below for 3 means) is very time consuming and boring of course. is there any macro available in excel to build coefficient of orthogonal contrast for pairwise comparison of n means (50 for instance)?
    I do appreciate for your comment beforhand

    • Charles says:

      Ramin,
      I am not aware of any such macro, but in any case, any such set of orthogonal contrast coefficients may or may not be the set that is useful to someone in a particular situation.
      Charles

  35. samiullah says:

    dear charles,
    i am from pakistan. may be my english is very poor because it is not our mother language. anyhow, i want to learn logistic regression theoritically and its usings practically. can you help me in solving my problem?

    • Charles says:

      The theoretical background of logistic regression is provided on the Real Statistics website, especially the Logistic Regression webpages. If you need additional detail, please look at the Bibliography.
      Charles

  36. Dilmi says:

    Dear Sir

    I have two data sets. One (Sr concentration) coming from ICP-MS analysis along an animal shell. This has around 12,000 data points covering 9 years of shell growth. And the second data set is monthly temperature for that 9 years which has 108 data points. Can I correlate these data to see weather Sr concentration increase with temp. ? If so what is the method I can use? If not what can I do to see the relationship between these two data sets?

    Thank You

    • Charles says:

      Dear Dilmi,
      The two sets have to have the same number of data elements in order to perform a correlation. If you know they temperature for each of the 12,000 data points then you can perform a correlation. You can also use linear regression to better see what this relationship is and to make predictions.
      Charles

  37. Víctor says:

    Do you have any paper about this statistics add on for citing?

  38. omid says:

    Can you help me?
    how can do moderatin effect with regression in excel??
    Can I do Enteraction Plots??

    • Charles says:

      If I understand your question, then you are looking for interactions in regression. If you have two variables x and y, then you model the interaction in regression by xy. See the following webpage for more details:
      Interaction
      Charles

  39. Jude says:

    Hello Dr. Zaiontz,
    Can the t-test be used in a correlational study? The nature of my study is quantitative and correlational in design and I have two groups: (1) CEOs with terminal degrees in Business disciplines, and (2) CEOs with terminal degrees in non-Business disciplines.
    Is it appropriate to use the t-test in this study? I know correlational studies examine the relationship between two or more variables, whereas comparative studies examine the similarities and differences between two or more groups; however, someone is saying that it is not appropriate to use t-test in correlational studies.
    Thank you in anticipation of your response.

    • Charles says:

      A t test can be considered as a form of correlation, as shown on the webpage
      Correlation in relationship to t test

      Provided we are using the term “correlation study” in the same way, I don’t see any problem with using a t test to determine whether is a difference between the two classes of CEOs.

      Charles

  40. JD says:

    Dear Dr. Zaiontz,
    Thank you so much for making our lives easy who are not statistician/mathematician. I need to analyze data to see the effect of a process on microbial loads (3 types of microbes checked pre- and post-process). There are around 1600 samples analyzed (not divided equally pre- and post-process) for each type of microbe (A,B,C). All these observations come from 3 different states, covering 4 seasons, and 2 years. I’ve used SAS (GLM, PLM) to analyze them. However, I would like to check with Excel since I am more comfortable in using the later. What analysis I can use to check the effect of the process? How can I check if season, year, state had any effect on the result? Also in one category (microbe C) I get only around 6% considerable value, while rest of the 94% is below detection (0). I am using log10 transformed values of bacterial counts for all these analysis. I am assuming the original number=1 (so, log10 of 1=0) when I don’t get a real value (in 94% cases). How can I justify this assumption?

    Thank you in advance.

    Best regards,
    JD

  41. Alain says:

    Dear Dr.Zaintoz,
    Thank you for this very useful – and free! – package. The examples and the website are very clear.
    I sue statistics almost daily and having a easy to use tool integrated to Excel is a must.
    Regards
    Alain

  42. Michael G. says:

    Dear Dr. Zaiontz,

    This website has been a huge help for me in my summer internship. I’m currently making a model to predict GDP and I really enjoy how you show the formulas to how everything is calculated.

    Thank you,

    Michael G.

  43. Steve Vardeman says:

    Hello Charles,

    Thanks for this material! It’s clearly carefully done and I’ve already found some of the multivariate functions in your package very helpful. I’m quite amazed at what you’ve put together and made freely available here. Bravo!

    For many reasons, I do my best to convince university students to use R for an open source (free) way to do statistical computations. But EXCEL is ubiquitous and sometimes considerations of what is most familiar clearly make using it the only real option. Your kind work makes it substantially less painful.

    Best Regards,

    Steve Vardeman

    • Charles says:

      Hello Steve,
      Thank you very much for your very kind words. This means a lot to me, especially coming from you, a person who has had a very illustrious career in statistic research, teaching and consulting.
      I am constantly updating the website and software, and I plan to add some interesting capabilities in the next few releases.
      Charles

  44. Greg says:

    Thank you so much for all of your work on this website. I’ve never had the strongest grasp of statistics but your website made everything easy to understand. Your resource pack is a lifesaver and greatly expanded my ability to perform data analysis in my research.

  45. RANJIT B NAIR says:

    Dear Sir,

    Slight error in the previous message. There are nine coefficients plus one intercept to be found. The R-square value is .94. But the p-values are very high like .74, .96 etc.

    Can I still use this equation for my prediction Sir?

    RANJIT

    • Charles says:

      Ranjit,
      You shouldn’t see a high R-square value when all the coefficients are insignificant.
      If you send me an Excel file with your data I will try to figure out what is going on. See Contact Us for my email address.
      Charles

      • RANJIT B NAIR says:

        Dear Sir,

        Thank you very much but also sorry for not responding promptly. I have sent an e-mail with the data values to your e-mail id. Please provide your valuable comments.

        RANJIT

    • Ryan says:

      Ranjit,

      By reading your comments it sounds like your model has multi-collinearity (correlations between your independent variables) which inflate their variances and produce high p-values in the parameter estimates. Or you might be over-fitting the model. These are my suggestions. Check to see if they are causing your problem.

      Ryan W.

      • RANJIT B NAIR says:

        Dear Mr.Ryan,

        Thank you Sir.

        There are correlations between the independent variables. But I didn’t understand what you meant by ‘overfitting’. Does this mean I am using more number of observations?

        RANJIT

        • Ryan says:

          Ranjit,

          Sounds like you found your problem. By over-fitting I meant creating a model with high R^2 but with low predictive ability. Now that you know your independent variables are correlated, you can apply transformations to them or remove some from the model if necessary.

          Ryan

  46. RANJIT B NAIR says:

    Dear Sir,

    Hope you are doing well.

    I am currently doing a project as part of my M-Tech course.

    I have few data with me but when I fit a regression equation to the data, I get the p-values as high and r-square values as low for all (or majority) of the independent variables (both with and without interaction).

    There are three independent variables for one dependent variable.

    Can you help me understand why this is happening.

    Currently I am unable to use the equation because of the above problem.

    Please advise.

    • Charles says:

      It could be that your data does not fit with the linear regression model.
      You say that the “r-square values as low for all (or majority) of the independent variables”. Did you you build the regression model using all three independent variables? Have you tried to create three scatter plots between one of the independent variables and the dependent variable? Do the plots look linear?
      Charles

      • RANJIT B NAIR says:

        Dear Sir,

        Yes, I built the regression model using all the three independent variables.

        There is low correlation between all the independent variables and the dependent variables.

        Alternately, is there any other method by which I can get a good fit for the data I have?

        RANJIT

        • Charles says:

          Ranjit,
          It is possible you can find a fit for a polynomial curve or an exponential curve or many others. You can get more information about these on the following webpages:
          Polynomial Regression
          Exponential Regression
          Charles

          • RANJIT B NAIR says:

            Dear Sir,

            I used quadratic regression with interactions (three independent variables). So totally there were ten coefficients to be found including the intercept.

            I got an R-square value of 0.94. But the p-values for all the coefficients are very high, like .74, .96 etc. So can I use this equation for my prediction?

  47. Dear Dr. Zaiontz,
    Your job and generosity much appreciated, thank you. I have just downloaded the add inn and discovered its numerous abilities. I will use it for my “Data Analysis using Excel” course during the next academic year.
    All the best,

    Erkin

  48. Rose says:

    I have an excel sheet with about 300 variables in 700 rows ( sample). I have random missing data and they are empty. I would like to put 99 in the empty boxes ( missing data). Do you have any suggestion, please?
    Thank you.

    • Charles says:

      Rose,
      If your 700 x 300 sample starts in cell A1 then place the formula =IF(A1=””,99,A1) in cell KP1 (the 302nd column). Now highlight the range KP1:ALM700 and press Ctrl-R and then press Ctrl-D. The range KP1:ALM700 will now contain the data as you desire it.
      Charles

  49. Ashwin says:

    Great website Dr.Zaiontz. It helps me a lot when I have to rapidly apply a concept that I have a vague memory of.

    Also, as a fellow Purdue grad, Boiler-UP!

  50. reza says:

    hi DR… i am very weak in statistic but i decided to improve my skill …i live in Iran and in Iranian web sites i could not find many and useful topics about effect size…i wish that you help me in this field… i have a question about G power … my question is about founding effect size(Cohen d) in one sample case … i see in very texts that The standard effect size for a one-sample t-test is the difference between the sample mean and the null value in units of the sample standard deviation… we use this effect size to specify our sample size….

    now my problem is that i can not distinguish “null value” and “sample mean”….
    i am interpreting that when we do not specify our sample size yet ,how can measure its mean?!!!!
    please help me and in a simple example clarify this issue for me….thanks a lot…

    • Charles says:

      Reza,

      It is an excellent question.

      The null hypothesis mean is based on what hypothesis that you are testing. The sample mean is measurable. Together with the estimated standard deviation you can estimate the effect size. Having said all of this, in reality, however, you estimate the effect size either based on experience, a guess or a previous study. There are rules of thumb such as a small effect is .1, a medium effect is .3, etc., but this may not help much either without some experience in running similar tests. With time, you get a feel for what are reasonable values, but even then you may find that your initial estimate for the effect size is wrong.

      An important thing to note is that the smaller your estimate of the effect size the larger your sample will need to be so that your statistical test will find it (if indeed your null hypothesis is shown not likely to be true based on on your statistical test).

      Thus you may estimate the effect size to be medium and so choose a sample of size consistent with this effect size, only to find that the test shows a smaller effect size, and so you actually needed to choose a larger sample, or that the test shows a larger effect size, in which case the sample you chose was larger than it needed to be.

      There are many concrete examples on the website for how to use the effect size to specify the sample size. E.g. Sample Size Normal Distribution and Sample Size of t Test.

      Charles

  51. Raja says:

    Hi Charles,
    I would really appreciate your effort in putting very good statistics resources. Especially excel calculation and immediate reference to linear algebra.
    All the very best!!!
    Best Regards,
    Raja

  52. Mark Farrell says:

    I wish you’d also consider publishing a book about Excel VBA and Statistics. I haven’t been able to find any good ones. Thanks for all you do.

    • Charles says:

      Mark,
      I am about to publish a book about Statistics, but not VBA and Statistics. What would you like to see in this book?
      Charles

  53. Caro dr. Charles,
    innanzitutto ringrazio la prof. Caterina per la splendida idea che ha avuto.
    Frequento internet da quando lo schermo era nero con scritte bianche. Il WEB ha portato la “visione delle cose” e inevitabilmente, tanta schifezza e ripetitività.
    Questo è l’esempio positivo di quello che dovrebbe essere un sito: Semplicità e Contenuti. E’ superfluo cercare di quantificare l’enorme mole di Pregevoli Contenuti.

    Se Lei ha conosciuto sua moglie in Italia, il fatto (mi auguro di cuore) è positivo. Altrimenti è difficile avere dei vantaggi (sopratutto morali) venendo in questo Paese di delinquenti profondamente vigliacchi e corrotti.

    Auguro alla Sua famiglia tanta serenità, ringraziandoLa dello straordinario lavoro messo a disposizione in questo sito.

    sergio colautti

    • Charles says:

      Gentile Sergio, grazie delle sue parole di apprezzamento !! Sono contento che il mio lavoro le sia utile. Continuero’ ad aggiornarlo e ad aggiungere nuovi contenuti.
      Auguri anche a lei
      Charles e Caterina Zaiontz

  54. GabSoo Han says:

    Thanks Dr Charles,
    thank you for your kind providing very good your Real Statistics Resources for other people.
    You are great statistician and very good people.

  55. Rama says:

    Hi Dr Charles,
    Thanks for your resource and guidance on excel stats works. very much appreciated.
    I am MBA student doing research in social entrepreneurship. I have develop 18 scale items for the specific research context and i want to check validity of those items. how can i perform? i checked your factorial analysis guide. but couldn’t, it is appreciable if you guide on excel sheet. thanks

    Rama

  56. ahmed says:

    Thank you very much
    Q
    How we can use spss to test equality of two correlation matrices

  57. John says:

    Dear Dr. Charles Zaiontz,

    How can I do the simulation of a time series of 50 weeks of returns using CAPM beta, non-systematic standard deviation, standard deviation of market, risk premium and risk free.

    The whole task is “You are asked to simulate a time series of 50 weeks of returns on each of the above stocks. Start by simulating six 50-week return series which are orthogonal (i.e. ignore correlation among the stocks) and have zero mean, then use those six orthogonal series to generate six series which have the right covariance matrix and the right expected return.”

    With kind regards,
    John

  58. Bob Hirsch says:

    Is there a way to obtain the logistic regression coefficient covariances from your package?

  59. Alex Kesaris says:

    Charles,

    Thank you very much for providing the Real Statistics add-in capability for Excel. I now have what I need to independently do my work with Excel. Tremendous!

    All the best to you, Sir.

    Sincerely,

    Alex Kesaris

    PS: I’ve tweeted about Real Statistics via @Sustenant and I’m now following you on Twitter. AK

  60. John Schmidt says:

    Charles,I wanted to take a moment and say thank you for this incredible trove of information. I can tell you that, indirectly, it is used to the great benefit of many people. SAS / SPSS have their places, but when you’re building a dynamic model, Excel is the medium for a lot of statistics. Having a website like this makes the difference. Kudos.

  61. Aaron says:

    The site and excel addin are both excellent. Everything is very straight forward, and easy to follow and implement. It makes things I used to do in R and Matlab easily accessible in Excel. Thanks so much for this creation.

    Thanks,

    Aaron

  62. Matt says:

    Good morning Charles,

    Yes, you are absolutely right (like always). I’ve restarted excel and your package was not running. Yes I do have ver 2.15… I guess this is the latest one.
    Since I am not a “statistician” I would like to know if the significance [Sig. (2-tailed)] .000 (SPSS) is correct and does have a positive meaning or not.

    Thank you for your time answering my questions. I really appreciate if.

    Cheers
    Matt

  63. Matt says:

    Hello Charles… I am really speechless with your website and your hard work. Congrats.
    I’ve been searching online to try to find a good explanation of SPSS about my results, but I couldn’t find something that will answer my needs.
    However, My wife is doing Masters in Psycho-Sociology with the subject Children-parent Attachment and she had a questionnaire with 20 questions for parents and there were 40 children ages 2-5. I’ve entered all the information correctly on SPSS but for some reason I am not getting the same results that someone else did previously.
    I would appreciate if you could give me a brief guide how to do get the same results that previous calculations.

    Thank you and God Bless.

    • Charles says:

      Hello Matt,

      Thank you for your kind words about my website. I have tried to create an environment where people can use Excel to perform real statistical analysis and also learn a lot about statistics.

      I can’t tell from your comment which type of analysis your wife is trying to do in SPSS. I am not an expert in SPSS, but perhaps I can tell you how to do the analysis using the Real Statistics Resource Pack.

      Charles

      • Matt says:

        Thank you for your prompt reply.

        Basically she is trying to find the child attachment and his/her socialization, she need to get results of Cronbach Alpha, T-test, and results from attachment and socialization. I don’t know if I can send you my data you will see and I am sure you will understand. I am copying everything to excel and I am trying to run your calculator and see how it will come out.

        If you agree on sending you my data, please email me and I will send you asap.

        Thank you in advance.
        Matt

        • Charles says:

          Matt,
          You can run Cronbach’s Alpha (via the CRONALPHA function) and T-tests (via the T Tests and Non-parametric Equivalents data analysis tool) using the Real Statistics Resource Pack. If you run into problems let me know.
          Charles

          • Matt says:

            Good morning Charles,

            I’ve downloaded from your website RealStats.xlam, and I am not seeing CRONALPHA function when I open your pack. Have I downloaded the wrong pack?

            Is there anyway to upload a screenshot that I wanted to show you?

            Matt

          • Charles says:

            Matt,
            You won’t find CRONALPHA in the list of data analysis tools when you press Ctrl-m, but you should be able to enter it as a function. If R1 is a range which contains your data as formatted in Example 1 or Example 4 of http://www.real-statistics.com/reliability/cronbachs-alpha/ then =CRONALPHA(R1) will return the value of Cronbach’s alpha. If this doesn’t work then you are probably using a very old version of the software. To find this out which version of the software you are using, enter =VER() in any cell.
            Charles

  64. Kathryn says:

    Starting capstone for MSN: Descriptive correlational survey research on moral distress in nurses. Survey tool uses Wocial and Weaver’s (2012) moral distress thermometer which rates the level of moral distress the nurse is experiencing. I developed a tool which incorporates the MDT and asks a series of questions regarding gender, years of experience, nursing area of practice, highest nursing degree attained, the region a nurse works in, question if the nurse has ever left or considered leaving a job due to moral distress, and a question related to feelings if improved professional autonomy influences levels of moral distress. Using SurveyMonkey to obtain audience and post survey. My professor told me to do a multiple regression. That was all the guidance given. I am now lost. Would you please send me the right way? I have limited time to do this project and scant experience with statistics. Thank you. Nurse Kathy

  65. kevin Pears says:

    my boss said go see if you can learn PCA。 i nearly quit my job, not really used my brain in so many years。。。。 your coverage of PCA more than gives me hope, thx so much for all the free information on PCA and so much more。 i hope to be able to look at differences in equipment matching behavior using PCA。。 I’ll tell you later if its a success。

    • Charles says:

      Kevin,
      I am very happy to hear that my coverage of PCA has helped you. I look forward to hearing how everything works out.
      Charles

  66. Daniel says:

    Great work – I loved your power calculations in Excel example!

  67. nisa says:

    thanks a lot for this! very much appreciated

  68. Bob Schwanke says:

    So glad to see this! Is binomial regression on your TO-DO list? I’m working on bug-proneness prediction, and all the independent variables are count data …

    Bob

  69. joan says:

    Thank you for the great excel add on!

  70. Manfred says:

    Great tool!
    After some minor adjustments it runs under a German Version of Excel 2013.
    Thanks a lot

    Manfred

    • Charles says:

      Manfred,
      This is great news. Until now I thought that the tools would only run in the English version of Excel. What did you need to do to make it work in German? This could be useful for those who want to use the software in French, Italian, Spanish, etc.
      Charles

  71. R. Ahmmad says:

    Sir,
    i am a faculty member of Statistics Dept., Jagannath University, Dhaka. i am really happy to find the new program for solving logistic regression model as well as Survival analysis. But sir i try more than 3 times to install the resource pack but fail because require “pass word”

    Please sir give me the instruction how can i install it in my excel page

    regarding
    R. Ahmmad
    Lecturer, Dept of Statistics, Jagannath University, Dhaka, BD

    • Charles says:

      Dear S. Ahmmad,

      You should not need to supply a password in order to use the resource pack. The key is to have Excel recognize the software as an “add-in”. To do this you must perform the following steps:

      1. Select Office Button > Excel Options > Add-Ins in Excel 2007 or File > Help|Options > Add-Ins in Excel 2010/2013, and click on the Go button at the bottom of the window.
      2. Check the Realstats option on the dialog box that appears and click the OK button.
      3. If this option doesn’t appear, click on Browse to find and choose the realstats.xlam file. Then complete step 2 as described above.

      See http://www.real-statistics.com/free-download/real-statistics-resource-pack/ for more details.

      Charles

      PS I am about to release an upgrade which provides multinomial logistic regression.

  72. Don says:

    Congratulations Charles. You have developed a great website with lucid explanations for people like me to learn statistics using Excel.

    Don

  73. Marc says:

    An excellent tool that Dr. Zaiontz had made available. Dr. Zaiontz emailed me a description and examples of how to set the program up in Excel, which helped me enormously because Im not a statistician. I highly recommend the program.

    Marc
    Certified Wildlife Biologist

  74. chiara says:

    Thank you very much for your Excel add-in!!! It is very useful for me as I would like to do some statistical analysis in a very simple way…as it is with your excellent tool! Grazie!

  75. alberto rivas says:

    thank you for this excellent web page. I am peruvian, so if you come to mi country, Peru, I´ll be gld to show you the best places and act as your guide

  76. Chunhao Yu says:

    Hi Charles,

    Thank you very much for developing a great Excel add-in for statistics.
    Much appreciated!
    Best wishes to you, your wife and your family.

    Sincerely,

    Chunhao Yu Ph.D

  77. massimo says:

    Hi Charles,

    Great that you built an excel-based stat tool. I’m not a statistician but I’m trying to apply probit model to some economic data. I see you have the logit in your package. But I don’t see the probit. How can I estimate the coefficients of a probit model with multiple independent variables in excel?
    Grazie mille
    Massimo

    • Charles says:

      Hi Massimo,
      I plan to implement probit shortly. I am now in the process of adding multivariate statistics capabilities (esp. MANOVA and Factor Analysis). Once this is released later this month, I will add probit.
      Charles

  78. sana arian says:

    hi all. iam MA student. how can i score this test?
    Q3/ Have you ever heard or come across or used the following English idiomatic expressions? write Yes or No for each item , if yes translate them into Kurdish. (i divide the scores as follows. 0 for those who choose NO, 0.5 for those who say yes but cannot translate it correctly, and 1 for those who write yes and translate it correctly)
    1- Raining cats and dogs
    2- Zip your lip
    3- Throw in a towel
    4- Talk nineteen to the dozen
    5- Green light
    6- Penny pincher
    7- Yes man
    8- Month of Sundays
    9- All mouth and trousers
    10- Pitch dark (black)
    11- Hard-and-fast
    12- Poor as a church mouse
    13- From time to time
    14- Under the sun
    15- On the cards
    16- In a circle
    17- Face-to-face
    18- Fifty-fifty
    19- Parrot-fashion
    20- Many moons ago.
    how can i find item facility, difficulty, and discrimination for this test. plz help

    • Charles says:

      Sana,
      I have now added a webpage on Item Analysis which describes how to calculate item difficulty and discrimination.
      Charles

  79. Dhaval Patel says:

    Hi Dr. Zaiontz,

    Thank you for developing a great Excel add-in for statistics. Much appreciated!

    Sincerely,

    Dhaval Patel MD

    • Charles says:

      Dr Patel,
      Thank you very much for your kind remarks. Stay tuned. Next month I am planning to release a new version of the add-in which will include new data analysis tools and multivariate analyses.
      Charles

    • Dear Dr.Zaintoz:
      I just wanted to drop you a note to say how much I appreciate your outstanding website and the clarity of your examples. I certainly will recommend your website to my students. I only wish that you would have taken the trouble of putting together a book with a compendium of your examples.

      Thanks very much for so unselfishly sharing your knowledge.

      Dr.Raju

      • Charles says:

        Dear Dr. Raju,
        I am very pleased to read your comment.
        I have written a couple of books based on the website and plan to publish the first of these in January.
        Charles

Leave a Reply

Your email address will not be published. Required fields are marked *