Dr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).

Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at BBN, one of the most prestigious research institutes in the US, and widely credited with implementing the Arpanet and playing a leading role in creating the Internet.

Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.

He grew up in New York City and lived in Indiana, Florida, Oregon and finally Boston, before moving to Europe 29 years ago where he has lived in London, England and northern Italy.

He is married to Prof. Caterina Zaiontz, a clinical psychologist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.

Hello, I’d like to ask a beginner’s question about multiple regression – I’d be incredibly grateful for your time. I’ve only recently learned the basics of linear regression and I still have the following nagging doubt.

I’d like to analyse some sales data for the purpose of forecasting future performance. My dependent variable (Y) is ‘profit/loss’, which simply represents a sales figure for individual retail items. This is the variable I would like to forecast; (there are certain quantifiable conditions for each attempted sale of an item and these are my independent variables). My question stems from the fact that the historical values I have for Y are either a positive number (ranging from 0 to 1000) or a FIXED negative value of -100; (an item may be sold for any amount of profit but the wholesale price to the seller of each item is the same, hence the same fixed loss amount for any unsold items). A sample of the data for Y might look like this (note the fixed negative value of -100 in a few instances):

23

55

201

-100

13

-100

321

124

57

-100

33

It’s my understanding that a multiple regression model here would produce varying negative (and positive) values for Y, and this is not my issue. What I’d like to know is, are there any other implications of using this sort of input in a regression model? Or can it be treated in the same way as any ratio type data? Perhaps it sounds silly but I’m wondering whether the fixed negative values might somehow pose a problem. I’m not trying to replicate the fixed -100 value for the losses, only trying to get to true averages so that I may accurately predict the profitability of an item’s listing for sale (and avoid unprofitable listings). Hope this all makes sense. Thank you very much.

I don’t see any problems with this fixed negative amount as long as it truly represents the Y value.

The real problem you have is that the data may not fit a linear regression model. You should graph the data and see if it is truly linear. You should also graph the residuals and make sure that they are randomly distributed.

Charles

Charles, thank you very much for your reply. I’ll be sure to check that the data meets the various requirements of a linear model.

Regarding your last point, the logic is that the residuals would be randomly distributed because the relationships between the variables remain constant, regardless of the profitability of a given listing. I will of course check the graphs, but it would help to know that I have the theory straight.

May I ask then, can I take it that if that an AVERAGE for Y in my case can be viewed in the same way as that for any appropriate independent variable, that duplicate/fixed values (in themselves) do not pose a problem in a linear regression analysis?

To clarify, let’s say there were no fixed loss amounts for Y in another case, that they were free to fall anywhere on the same continuous scale (as you usually find in any textbook example). Let’s also say that the average for Y in both cases is equal. Is it then safe to say that there is no apparent cause for concern with the data I have (assuming that it is appropriate for a linear regression analysis in every other way)?

Sorry if my limitations here are making things unclear or unnecessarily complicated! Thank you. Ben

I don’t see any particular problems with duplicate data provided the assumptions for regression are met and the negative value can be compared with the other values and is not a conventional value (like coding all missing data as -99).

Charles

Got it, thank you so much Charles. Congrats on the excellent resource pack and website here – well done! I hope you are well repaid for sharing your expertise. Ben

Great. Glad I could help.

Charles

Dear Dr.Charles

sir i want to you tell may some gaidence about how to writte research paper . pleas sir i have no any good teacher in provience blochitan in country pakistan.

There are many online guides to how to write a research paper, including the following:

www3.nd.edu/~pkamat/pdf/researchpaper.pdf

https://www.liebertpub.com/media/pdf/English-Research-Article-Writing-Guide.pdf

You can find more by googling “how to write a research paper”

Charles

Dear Dr.Zaintoz,

Please guide to interprete Tau observed and critical values of Augumented Dickey fuller test for stock market prediction

The variable considered is open

Thanks

Sneh Saini

See the following webpages

Dickey-Fuller

Augmented Dickey-Fuller

Charles

Hi, Dr. Charles Zaiontz,

Thanks for your excellent site and relentless effort.

I am working on a research using 4-point likert. I decided to use Chi Square to test the null hypothesis. What kind of test can i use for reliability? is Chi square enough?

In addition I have 100 respondents.

Thanks in anticipation

Hello Iro,

Chi-square is not usually considered to be a test for reliability.

Which test to use depends on what you mean by reliability. Please see the following webpage>

Reliability.

Charles

Thanks so much Dr Charles.

Dear Dr.Zaintoz,

I want to conduct experiments which will be done by human subjects. The outcome is explained by 10 user predictors and 6 task predictors.I want to calculate the sample size of users and how many tasks that each user should do to achieve a specific power .

Thanks in advance

Mushtaq

Mushtaq,

If you are planning to use multiple linear regression to do this, then I suggest that you look at the following webpage

Sample Size for Regression

Charles

Thank you Dr. Zaintoz for replying.

Does this give sample size of users and tasks separately? because the output variable depends on two sets of predictors: one from users and other from tasks.

Thanks,

Mushtaq

Sorry, but I don’t understand your question.

Charles

Dear Dr. Charles,

I mean, the sample size represents the overall observations that I have to get to achieve the requirements . But this number is a combination of number of tasks and users. For example, if sample size is 200, then I have 20 users by 10 tasks, 10 tasks by 20 users , or 40 users by 5 tasks and so on. This is the case if I want each user does the same tasks that done by other users .But, if each user does a different task, I think the sample size =number of users =number of tasks.

Thanks,

Mushtaq

Hello Charles!

I am a student at Uppsala University in Sweden, and I have unfortunately not done any statistics during my four years there, which I regret now when I’m doing my master thesis (earth science).

I have been trying to understand Time series analysis and PCA but it seems extremely complicated. I was just wondering if you could help me answer a basic question about it?

I have data of pore pressures at three different depths (in the ground), measured two times per day for about 5 years. The problem is that sometimes the measuring device stopped working so there are a lot of missing data, sometimes days, sometimes months. So my question is if it is even possible to make a time series analysis or a PCA with this kind of data?

Kind regards, Hanna Fritzson

Hanna,

It is often possible to handle missing data, but care needs to be taken. See the following webpage:

Handling Missing Data

Charles

Hi Dr. Raju,

Thank you for making this helpful Excel-Stats tool available for the public free of charge!

The Max-Diff analysis is becoming a popular analytical method for consumers’ preference (see “https://datagame.io/maxdiff-excel-modeling-template/”). Would you consider adding the Max-Diff analysis module to the current Real Statistics Resource Pack (release 4.9, as of 9/17/2016)? Thanks in advance for your consideration (or advice if such an analytical tool is available/accessible free of charge elsewhere).

Thanks,

Max

Hi Max,

I am about the issue the next release of the Real Statistics Resource Pack, but I will consider Max-Diff analysis for a future release.

Charles

Hi Charles,

Thank you so much for sharing the use of excel instead of SPSS! This is exactly what I was looking for!!!! I am very glad to find your website.

I am actually in the middle of completing a dissertation and struggling with analyzing the results I got from an online survey software which is very good and easy. However, I wanted to confirm if I am OK to use only excel for analyzing to complete the dissertation. Also if I should prove the reliability on each results I got from the survey.

Please let me know if you need more information to answer my questions.

I am very happy to have a conversation with you via email if you are free.

Thank you so much in advance!

Of course, my objective is to provide analysis capabilities in Excel that are accurate and just as good as those provided by tools such as SPSS. You should be aware that there is a bias against using Excel for statistical analysis. Some of this is based on some errors in earlier version of Excel and the lack of many of the mostly commonly used tools. Microsoft has corrected these errors in the last few releases of Excel and I have worked hard to add capabilities that are assessible from Excel but are missing from standard Excel.

Charles

Hi Sir Charles!

Thank you so much for the information about Kappa stat. It helped me a lot in my Research.

I’m actually conducting a research about the Effectiveness of a Diagnostic kit (local) compared to the commercial one using 60 samples. The commercial one is my Golden standard. And my adviser told me to use Kappa. However, I’m still in the process of absorbing the information/formula. Do you have any simpler formula for my problem?

I’m sorry for my demand but thank you in advance Charles! God bless!

Jehvee,

If your adviser wants you to use kappa, then you need to use the formula for kappa. This formula is not very complicated. See Cohen’s Kappa for details.

Charles

Hey Charles, I’ve come across another problem and was hopping you may be able to help me. I have a group of patients with an expected bleeding risk of 5, 48% in 3 months, and a thrombotic risk of around 10% over the same amount of time, those calculations came from big studies with over 3000 patients. Anyway, how can I compare those so see if this difference justify or not the use of an anticoagulant agent? Would Odds or Hazard ratio be useful in the scenario? To compare chances of 2 different events over the same sample? Thx in advance

Vitor,

It sounds like survival analysis (hazard ratio) might be the way to go. See the following webpage

Survival Analysis

Of course, you need to determine what sort of ratio is acceptable.

Charles

Dear Charles,

First, I would like to congratulate you for your website.

I wonder if you have examples of the sequential version of the statistical test of Mann-Kendall which detects change points in time series.

Dear Leonardo,

Thanks for your kind remarks about the website.

Unfortunately, I don’t yet support the Mann-Kendall test.

Charles

On another matter, I have an excel spreadsheet of 2600 university donor prospects with 20 potential predictor variables and am trying to predict those that have a higher likelihood of giving a gift of $10K+. I have experimented with the logistic regression tool and found it very difficult to use and interpret. I’m wondering if a simpler and equally effective solution would be to simply group the prospect list into groups representing all possible configurations of the predictor variables ( permutations?), compute the average number of $10K gifts given by each group historically, then rank the groups from highest to lowest. Those groups with the highest number of $10K+ gifts would receive priority in future fund raising. Does this make sense?

Tom,

It would seem that there would be a very high number of permutations of the 20 predictor variables. Even if each predictor variable was binary you would have 2^20 possibilities, which is a number larger than one million. With only 2600 donors most of the combinations would not have any representation. I can’t say that a logistic regression model would work any better.

Charles

Charles – is there literature available on your website explaining/interpreting the output of the logistic regression tool?

Tom,

Yes. See the Logistic Regression webpages on the website.

Charles

Thanks Charles.

Dear Charles,

I downloaded your tool today. I came across your website by accident, but I’m really glad I did. This is already proving to be of great help. I’m currently working on a master thesis about the relationship between prosperity and residential solid waste generation. Your website will be duly cited, of course. Thank you very much indeed for sharing your know-how. This is what I remember the internet used to be about!

Best wishes,

Marcel

Hello Dr. Zaintoz,

I have question performing F test in excel. It is suggested that one needs to sure the variance of variable 1 is higher so the numerator value is higher when calculating F. But even if you don’t make sure the variance of variable one is higher the p value remains the same on excel. So why worry about that part? with a confidence of 95% alpha =0.05, if p is greater than 0.05 you accept the null hypothesis. We care about the p value which doesn’t change with variable 1 higher or lower. right?

Thanks for your time

Hello Uday,

=FDIST(x/y,df1,df2) yields the same answer as =1-FDIST(y/x,df2,df1)

Thus, you don’t have to make sure that the variance of variable 1 is higher, but in that case you will need to make the adjustment to the test as described by the above equality.

Charles

Hi, Dr. Charles

Actually i need development a method for detectec periodicity in a data series, ¿is there any way to do this method completly in excel?

Regards

Carlos,

See the following article:

http://www.l3s.de/~anand/tir14/lectures/ws14-tir-foundations-2.pdf

You can use the Excel capabilities described on the following webpages to implement the ideas in the above article.

Time Series Analysis

Charles

dear sir, can I have your email ID? I want to email you my research proposal, can u please suggest an econometric technique that I can apply to quantify results.

Hina,

See Contact Us

Charles

Charles,

Thank you very much for this great learning resource. I am very, very grateful.

Zoran

Pingback: Amazing Excel Stats resource | vunderbolt

Dear Charles,

Thank you for some tips on statistics as I want to prepare a paper for my exam using Excel. However I am buffered with spelling of your surname spelling. I assume on the base how sounds your surname, you are of polish origin. Perhaps you have some relatives called Zając. Once again thank you for your help.

Hi Kevin,

You are correct. The surname is of Polish origin.

Charles

Hey Charles,

I was trying to automate a binary logistics regression analysis on the RealStat add-in using a VBA, is there a way that I could do that? So I can just press a macro button to do the binary logistics regression.

Btw I’m a student at Georgia Tech trying to incorporate the code for my senior design project, so this is fully for academic purposes and won’t be used commercially.

Juan,

You can do this by writing a macro that calls the Real Stats add-in in VBA. This is explained on the webpage

Calling Real Statistics Function in VBA

Charles

Can you walk me through on how to do logistics regression using vba? I’m trying to utilize the LogitSummary function from your add-in.

Really good! I have one question, you people may find simple to answer. What should be the maximum value/level/cutoff of within group variance to conduct an ANOVA or depends on ANOVA result? I dont mind whether mean are differing significantly or not. I know there are many tests to know suitability of data to perform an ANOVA, but its does not satisfy my need. I want know, what should be maximum within group variance to understand that sampling was correct. since many a time high withing group varince reduce the F-ratio and make group mean difference insiginficant, despite of high variation between group.

Hi Charles,

Great website! A wonderful resource for those looking to deepen and broaden their statistical knowledge.

One question: I downloaded the Excel add-in and when I tried to run a logistic regression model I got the following message:

“Compile error in hidden module: Logistic Regression. This error commonly occurs when code is incompatible with the version, platform or architecture of this application. Click help for information on how to correct this error.”

How to fix this error?

Thanks….

Jack,

What version of Excel are you using?

Charles

Hey Charles, am also getting the same error as Jack. Am using Excel 2013

Mark,

The usual reason is that the Real Statistics addin has not been installed properly. A common problem is that Solver has not been installed prior to the installation of the Real Statistics addin. You can determine whether Solver was installed by pressing Alt-TI and seeing whether Solver is on the list of addins with a checkmark next to it.

Charles

Please help me! I have to make a homework, multiple analysis regression but i don”t know what to take like depended and indepded variable( i know whAt they are but i want example, conctret example) than i will try to find statistics for time series for minimum 30 years,,, please help meee

Anna,

I have a lot of information on the website about multiple linear regression. Please look at the following webpage for the list of topics about this subject. Click on any of the topic that is relevant to you. There are examples that you can use.

Multiple Linear Regression

If your data is time series data, then you also need to worry about autocorrelation. This topic is covered on the website, but other topics about time series data will be added shortly.

Charles

Thank you! But i want for example i take like depended variable loans with problems in united kingdom= gdp rate , inflation rate, unemployment, interest of loans rate, exhange paund- us $ ,,, i found some statistics but for the loans with problems i found only for some years not for 20 or 30 years ,,, i dont know if you u derstend me ,,, can you help me?

Anna,

I understand you and am happy to answer specific questions, but I don’t have a specific example of the type you describe. You can find these types of examples in references on time series or statistics for economics.

Charles

hello l’m carlo mallari, how can i know the computation for the least significant studentized range distribution table of Duncan’s multiple range test? its formula?

Carlo,

The Real Statistics Resource Pack doesn’t support Duncan’s multiple range test, since I use Tukey’s HSD test instead. The resource pack does calculate the studentized range distribution as well as the critical values from that distribution. With this information, you should be able to carry out Duncan’s multiple range test,

Charles

Dr. Zaiontz,

The Paired Sample Hotelling T2 example is very similar to something a friend is working on. When she showed me her data (“scores” like data), I was hesitant to go the Hotelling / paired-T route as I was uncomfortable about the normal assumption for the data since it was more ordinal in nature. If you were to do that example in a non-parametric analysis, would you do Wilcoxon signed rank with Bonferroni like control for Type I or do you have a better suggestion?

Thank you!

The Wilcoxon signed-ranks test is a commonly used non-parametric test in case the assumptions for the t test don’t hold. Hotelling’s T-square test is the extension of the t test when there are multiple dependent variables. I don’t know of any non-parametric versions of this test, although I came across the following article which may be suitable:

www3.stat.sinica.edu.tw/statistica/oldpdf/A8n310.pdf

Charles

Dear Charles:

I have been using the realstats to calculate Principal Component Analysis, but there is an issue. I use 4 variables to do the covariance matrix and then I obtained the coefficients for these variables with the evectors function. However if I changed the order of the variables in the covariance matrix and put one of them in the first place, I obtained the same coefficients as before but with different sign. why happens that? Am I doing something wrong? or there is a problem with the variables? thank you for your answer

Miguel,

If I understand what you are saying correctly, this is completely normal. You aren’t doing anything wrong and there isn’t any problem with the variables.

Charles

Thank for your answer Charles, but I´d better explain with numbers my doubt:

I used variables A, B, C and D to do the PCA. And I put them in the order: “ABCD” in the covariance matrix function and then in the evectors function.The coefficients for the first component were A: 0.5, B: 0.4 , C: 0.3, D: -0.2.

Then I changed the order of the variables: “DABC” in the covariance matrix and evectors functions and the coefficients for the first component were A: -0.5, B:-0.4, C:-0.3, D: 0.2. Therefore, the values for the first component are not the same for the two arrangements even tough are the same variables A, B,C and D and now I want to graph the first component but I don’t know which are the correct ones.

Hope I clarified my doubt and I´d appreciate your comments about it, thank you

Miguel,

If A is an eigenvector corresponding to eigenvalue c, then so is -A. Thus there is no problem when the signs get reversed. This is normal and won’t effect your final result. If, for example you have a column with say 7 negative values and 2 positive values, you can simply flip the signs so that you have more positive value if this makes the real-world interpretation better. I believe that I did this in one of the examples I gave (although perhaps I did this in an earlier version of the website).

Charles

Thank you again Charles for you kind reply. I wil try your suggestion. It is nice to count with your help. Best Regards

Dear Charles,

May you can help me.I am doing a reserch in UVA\UVB treatments.

I have done an taste test:

18 peoples tasted salad from diferent treatments and asnwred the questions.

Which salada is best taste?

Which salade is most bitter taste?

They dont give me score only replay

How can I organise the data for a anova analysis.

Salad 1 = control salad 2 = UVA treatment salade3 = UVA\UVB treatment

Thanks in advance

Dinalva

I am from Brasil but my studies is at NMBU university in Norway

This depends on a couple of things:

1. Are the same 18 people answering both questions? (and are these the only questions?)

2. What exactly are you testing? (i.e. what is your null hypothesis?)

Charles

Dear Dr. Charles;

Hi, Hope you to begin a happy new year full of health and Joy. I’m Ramin and I’m from Iran. May I draw your attention to building coefficient of orthogonal contrast. If you Have only 3 means to compare you can easily have:

contrast C1 C2 C3

——————————-

1 vs 2 1 -1 0

1 vs 3 1 0 -1

2 vs 3 0 1 -1

——————————-

now consider that you have (for instance) 50 means to compare. writing of a dimension of 50*50 matrix of orthogonal coefficient(as we see in example below for 3 means) is very time consuming and boring of course. is there any macro available in excel to build coefficient of orthogonal contrast for pairwise comparison of n means (50 for instance)?

I do appreciate for your comment beforhand

Ramin,

I am not aware of any such macro, but in any case, any such set of orthogonal contrast coefficients may or may not be the set that is useful to someone in a particular situation.

Charles

Dear Dr. chales

Hi again and Many thanks for your useful hint. I agree with you!

best regards

Ramin

dear charles,

i am from pakistan. may be my english is very poor because it is not our mother language. anyhow, i want to learn logistic regression theoritically and its usings practically. can you help me in solving my problem?

The theoretical background of logistic regression is provided on the Real Statistics website, especially the Logistic Regression webpages. If you need additional detail, please look at the Bibliography.

Charles

Dear Sir

I have two data sets. One (Sr concentration) coming from ICP-MS analysis along an animal shell. This has around 12,000 data points covering 9 years of shell growth. And the second data set is monthly temperature for that 9 years which has 108 data points. Can I correlate these data to see weather Sr concentration increase with temp. ? If so what is the method I can use? If not what can I do to see the relationship between these two data sets?

Thank You

Dear Dilmi,

The two sets have to have the same number of data elements in order to perform a correlation. If you know they temperature for each of the 12,000 data points then you can perform a correlation. You can also use linear regression to better see what this relationship is and to make predictions.

Charles

Do you have any paper about this statistics add on for citing?

Victor,

See the webpage Citation.

Charles

Can you help me?

how can do moderatin effect with regression in excel??

Can I do Enteraction Plots??

If I understand your question, then you are looking for interactions in regression. If you have two variables x and y, then you model the interaction in regression by xy. See the following webpage for more details:

Interaction

Charles

Hello Dr. Zaiontz,

Can the t-test be used in a correlational study? The nature of my study is quantitative and correlational in design and I have two groups: (1) CEOs with terminal degrees in Business disciplines, and (2) CEOs with terminal degrees in non-Business disciplines.

Is it appropriate to use the t-test in this study? I know correlational studies examine the relationship between two or more variables, whereas comparative studies examine the similarities and differences between two or more groups; however, someone is saying that it is not appropriate to use t-test in correlational studies.

Thank you in anticipation of your response.

A t test can be considered as a form of correlation, as shown on the webpage

Correlation in relationship to t test

Provided we are using the term “correlation study” in the same way, I don’t see any problem with using a t test to determine whether is a difference between the two classes of CEOs.

Charles

Dear Dr. Zaiontz,

Thank you so much for making our lives easy who are not statistician/mathematician. I need to analyze data to see the effect of a process on microbial loads (3 types of microbes checked pre- and post-process). There are around 1600 samples analyzed (not divided equally pre- and post-process) for each type of microbe (A,B,C). All these observations come from 3 different states, covering 4 seasons, and 2 years. I’ve used SAS (GLM, PLM) to analyze them. However, I would like to check with Excel since I am more comfortable in using the later. What analysis I can use to check the effect of the process? How can I check if season, year, state had any effect on the result? Also in one category (microbe C) I get only around 6% considerable value, while rest of the 94% is below detection (0). I am using log10 transformed values of bacterial counts for all these analysis. I am assuming the original number=1 (so, log10 of 1=0) when I don’t get a real value (in 94% cases). How can I justify this assumption?

Thank you in advance.

Best regards,

JD

JD,

Excel doesn’t support this type of test.

Charles

Dear Charles,

Thank you very much.

Best regards,

JD

Dear Dr.Zaintoz,

Thank you for this very useful – and free! – package. The examples and the website are very clear.

I sue statistics almost daily and having a easy to use tool integrated to Excel is a must.

Regards

Alain

Dear Dr. Zaiontz,

This website has been a huge help for me in my summer internship. I’m currently making a model to predict GDP and I really enjoy how you show the formulas to how everything is calculated.

Thank you,

Michael G.

Hello Charles,

Thanks for this material! It’s clearly carefully done and I’ve already found some of the multivariate functions in your package very helpful. I’m quite amazed at what you’ve put together and made freely available here. Bravo!

For many reasons, I do my best to convince university students to use R for an open source (free) way to do statistical computations. But EXCEL is ubiquitous and sometimes considerations of what is most familiar clearly make using it the only real option. Your kind work makes it substantially less painful.

Best Regards,

Steve Vardeman

Hello Steve,

Thank you very much for your very kind words. This means a lot to me, especially coming from you, a person who has had a very illustrious career in statistic research, teaching and consulting.

I am constantly updating the website and software, and I plan to add some interesting capabilities in the next few releases.

Charles

Thank you so much for all of your work on this website. I’ve never had the strongest grasp of statistics but your website made everything easy to understand. Your resource pack is a lifesaver and greatly expanded my ability to perform data analysis in my research.

Thanks very much Greg for your comment. I am very pleased that the website has helped you.

Charles

Dear Sir,

Slight error in the previous message. There are nine coefficients plus one intercept to be found. The R-square value is .94. But the p-values are very high like .74, .96 etc.

Can I still use this equation for my prediction Sir?

RANJIT

Ranjit,

You shouldn’t see a high R-square value when all the coefficients are insignificant.

If you send me an Excel file with your data I will try to figure out what is going on. See Contact Us for my email address.

Charles

Dear Sir,

Thank you very much but also sorry for not responding promptly. I have sent an e-mail with the data values to your e-mail id. Please provide your valuable comments.

RANJIT

Ranjit,

By reading your comments it sounds like your model has multi-collinearity (correlations between your independent variables) which inflate their variances and produce high p-values in the parameter estimates. Or you might be over-fitting the model. These are my suggestions. Check to see if they are causing your problem.

Ryan W.

Dear Mr.Ryan,

Thank you Sir.

There are correlations between the independent variables. But I didn’t understand what you meant by ‘overfitting’. Does this mean I am using more number of observations?

RANJIT

Ranjit,

Sounds like you found your problem. By over-fitting I meant creating a model with high R^2 but with low predictive ability. Now that you know your independent variables are correlated, you can apply transformations to them or remove some from the model if necessary.

Ryan

Dear Sir,

Hope you are doing well.

I am currently doing a project as part of my M-Tech course.

I have few data with me but when I fit a regression equation to the data, I get the p-values as high and r-square values as low for all (or majority) of the independent variables (both with and without interaction).

There are three independent variables for one dependent variable.

Can you help me understand why this is happening.

Currently I am unable to use the equation because of the above problem.

Please advise.

It could be that your data does not fit with the linear regression model.

You say that the “r-square values as low for all (or majority) of the independent variables”. Did you you build the regression model using all three independent variables? Have you tried to create three scatter plots between one of the independent variables and the dependent variable? Do the plots look linear?

Charles

Dear Sir,

Yes, I built the regression model using all the three independent variables.

There is low correlation between all the independent variables and the dependent variables.

Alternately, is there any other method by which I can get a good fit for the data I have?

RANJIT

Ranjit,

It is possible you can find a fit for a polynomial curve or an exponential curve or many others. You can get more information about these on the following webpages:

Polynomial Regression

Exponential Regression

Charles

Dear Sir,

I used quadratic regression with interactions (three independent variables). So totally there were ten coefficients to be found including the intercept.

I got an R-square value of 0.94. But the p-values for all the coefficients are very high, like .74, .96 etc. So can I use this equation for my prediction?

Dear Dr. Zaiontz,

Your job and generosity much appreciated, thank you. I have just downloaded the add inn and discovered its numerous abilities. I will use it for my “Data Analysis using Excel” course during the next academic year.

All the best,

Erkin

Erkin,

I am pleased and honored that you will use the add-in for your course.

Charles

I have an excel sheet with about 300 variables in 700 rows ( sample). I have random missing data and they are empty. I would like to put 99 in the empty boxes ( missing data). Do you have any suggestion, please?

Thank you.

Rose,

If your 700 x 300 sample starts in cell A1 then place the formula =IF(A1=””,99,A1) in cell KP1 (the 302nd column). Now highlight the range KP1:ALM700 and press Ctrl-R and then press Ctrl-D. The range KP1:ALM700 will now contain the data as you desire it.

Charles

Great website Dr.Zaiontz. It helps me a lot when I have to rapidly apply a concept that I have a vague memory of.

Also, as a fellow Purdue grad, Boiler-UP!

hi DR… i am very weak in statistic but i decided to improve my skill …i live in Iran and in Iranian web sites i could not find many and useful topics about effect size…i wish that you help me in this field… i have a question about G power … my question is about founding effect size(Cohen d) in one sample case … i see in very texts that The standard effect size for a one-sample t-test is the difference between the sample mean and the null value in units of the sample standard deviation… we use this effect size to specify our sample size….

now my problem is that i can not distinguish “null value” and “sample mean”….

i am interpreting that when we do not specify our sample size yet ,how can measure its mean?!!!!

please help me and in a simple example clarify this issue for me….thanks a lot…

Reza,

It is an excellent question.

The null hypothesis mean is based on what hypothesis that you are testing. The sample mean is measurable. Together with the estimated standard deviation you can estimate the effect size. Having said all of this, in reality, however, you estimate the effect size either based on experience, a guess or a previous study. There are rules of thumb such as a small effect is .1, a medium effect is .3, etc., but this may not help much either without some experience in running similar tests. With time, you get a feel for what are reasonable values, but even then you may find that your initial estimate for the effect size is wrong.

An important thing to note is that the smaller your estimate of the effect size the larger your sample will need to be so that your statistical test will find it (if indeed your null hypothesis is shown not likely to be true based on on your statistical test).

Thus you may estimate the effect size to be medium and so choose a sample of size consistent with this effect size, only to find that the test shows a smaller effect size, and so you actually needed to choose a larger sample, or that the test shows a larger effect size, in which case the sample you chose was larger than it needed to be.

There are many concrete examples on the website for how to use the effect size to specify the sample size. E.g. Sample Size Normal Distribution and Sample Size of t Test.

Charles

Hi Charles,

I would really appreciate your effort in putting very good statistics resources. Especially excel calculation and immediate reference to linear algebra.

All the very best!!!

Best Regards,

Raja

Thanks Raja,

Charles

I wish you’d also consider publishing a book about Excel VBA and Statistics. I haven’t been able to find any good ones. Thanks for all you do.

Mark,

I am about to publish a book about Statistics, but not VBA and Statistics. What would you like to see in this book?

Charles

Caro dr. Charles,

innanzitutto ringrazio la prof. Caterina per la splendida idea che ha avuto.

Frequento internet da quando lo schermo era nero con scritte bianche. Il WEB ha portato la “visione delle cose” e inevitabilmente, tanta schifezza e ripetitività.

Questo è l’esempio positivo di quello che dovrebbe essere un sito: Semplicità e Contenuti. E’ superfluo cercare di quantificare l’enorme mole di Pregevoli Contenuti.

Se Lei ha conosciuto sua moglie in Italia, il fatto (mi auguro di cuore) è positivo. Altrimenti è difficile avere dei vantaggi (sopratutto morali) venendo in questo Paese di delinquenti profondamente vigliacchi e corrotti.

Auguro alla Sua famiglia tanta serenità, ringraziandoLa dello straordinario lavoro messo a disposizione in questo sito.

sergio colautti

Gentile Sergio, grazie delle sue parole di apprezzamento !! Sono contento che il mio lavoro le sia utile. Continuero’ ad aggiornarlo e ad aggiungere nuovi contenuti.

Auguri anche a lei

Charles e Caterina Zaiontz

Thanks Dr Charles,

thank you for your kind providing very good your Real Statistics Resources for other people.

You are great statistician and very good people.

Hi Dr Charles,

Thanks for your resource and guidance on excel stats works. very much appreciated.

I am MBA student doing research in social entrepreneurship. I have develop 18 scale items for the specific research context and i want to check validity of those items. how can i perform? i checked your factorial analysis guide. but couldn’t, it is appreciable if you guide on excel sheet. thanks

Rama

Rama,

Sorry, but I don’t understand your question. You need to provide more details for me to be able to respond.

Charles

Thanks Dr Charles,

Please guide me on How can I perform factor analysis and how can i interpret the result?

Rama

Rama,

I suggest that you look at the webpage http://www.real-statistics.com/multivariate-statistics/factor-analysis/

Charles

Thank you very much

Q

How we can use spss to test equality of two correlation matrices

Ahmed,

I don’t use SPSS. My website explains how to test the equality of covariance matrices using Box’s test in Excel via the Real Statistics software. See the webpage http://www.real-statistics.com/multivariate-statistics/boxs-test-equality-covariance-matrices/

Charles

Dear Dr. Charles Zaiontz,

How can I do the simulation of a time series of 50 weeks of returns using CAPM beta, non-systematic standard deviation, standard deviation of market, risk premium and risk free.

The whole task is “You are asked to simulate a time series of 50 weeks of returns on each of the above stocks. Start by simulating six 50-week return series which are orthogonal (i.e. ignore correlation among the stocks) and have zero mean, then use those six orthogonal series to generate six series which have the right covariance matrix and the right expected return.”

With kind regards,

John

John,

I haven’t yet implemented these sort of capabilities yet. I plan to add these in a few months.

Charles

Is there a way to obtain the logistic regression coefficient covariances from your package?

Bob,

Yes. Use the Logistic Regression data analysis tool.

Charles

Charles,

Thank you very much for providing the Real Statistics add-in capability for Excel. I now have what I need to independently do my work with Excel. Tremendous!

All the best to you, Sir.

Sincerely,

Alex Kesaris

PS: I’ve tweeted about Real Statistics via @Sustenant and I’m now following you on Twitter. AK

Charles,I wanted to take a moment and say thank you for this incredible trove of information. I can tell you that, indirectly, it is used to the great benefit of many people. SAS / SPSS have their places, but when you’re building a dynamic model, Excel is the medium for a lot of statistics. Having a website like this makes the difference. Kudos.

The site and excel addin are both excellent. Everything is very straight forward, and easy to follow and implement. It makes things I used to do in R and Matlab easily accessible in Excel. Thanks so much for this creation.

Thanks,

Aaron

Good morning Charles,

Yes, you are absolutely right (like always). I’ve restarted excel and your package was not running. Yes I do have ver 2.15… I guess this is the latest one.

Since I am not a “statistician” I would like to know if the significance [Sig. (2-tailed)] .000 (SPSS) is correct and does have a positive meaning or not.

Thank you for your time answering my questions. I really appreciate if.

Cheers

Matt

Hello Charles… I am really speechless with your website and your hard work. Congrats.

I’ve been searching online to try to find a good explanation of SPSS about my results, but I couldn’t find something that will answer my needs.

However, My wife is doing Masters in Psycho-Sociology with the subject Children-parent Attachment and she had a questionnaire with 20 questions for parents and there were 40 children ages 2-5. I’ve entered all the information correctly on SPSS but for some reason I am not getting the same results that someone else did previously.

I would appreciate if you could give me a brief guide how to do get the same results that previous calculations.

Thank you and God Bless.

Hello Matt,

Thank you for your kind words about my website. I have tried to create an environment where people can use Excel to perform real statistical analysis and also learn a lot about statistics.

I can’t tell from your comment which type of analysis your wife is trying to do in SPSS. I am not an expert in SPSS, but perhaps I can tell you how to do the analysis using the Real Statistics Resource Pack.

Charles

Thank you for your prompt reply.

Basically she is trying to find the child attachment and his/her socialization, she need to get results of Cronbach Alpha, T-test, and results from attachment and socialization. I don’t know if I can send you my data you will see and I am sure you will understand. I am copying everything to excel and I am trying to run your calculator and see how it will come out.

If you agree on sending you my data, please email me and I will send you asap.

Thank you in advance.

Matt

Matt,

You can run Cronbach’s Alpha (via the

CRONALPHAfunction) and T-tests (via theT Tests and Non-parametric Equivalentsdata analysis tool) using the Real Statistics Resource Pack. If you run into problems let me know.Charles

Good morning Charles,

I’ve downloaded from your website RealStats.xlam, and I am not seeing CRONALPHA function when I open your pack. Have I downloaded the wrong pack?

Is there anyway to upload a screenshot that I wanted to show you?

Matt

Matt,

You won’t find CRONALPHA in the list of data analysis tools when you press Ctrl-m, but you should be able to enter it as a function. If R1 is a range which contains your data as formatted in Example 1 or Example 4 of http://www.real-statistics.com/reliability/cronbachs-alpha/ then =CRONALPHA(R1) will return the value of Cronbach’s alpha. If this doesn’t work then you are probably using a very old version of the software. To find this out which version of the software you are using, enter =VER() in any cell.

Charles

Starting capstone for MSN: Descriptive correlational survey research on moral distress in nurses. Survey tool uses Wocial and Weaver’s (2012) moral distress thermometer which rates the level of moral distress the nurse is experiencing. I developed a tool which incorporates the MDT and asks a series of questions regarding gender, years of experience, nursing area of practice, highest nursing degree attained, the region a nurse works in, question if the nurse has ever left or considered leaving a job due to moral distress, and a question related to feelings if improved professional autonomy influences levels of moral distress. Using SurveyMonkey to obtain audience and post survey. My professor told me to do a multiple regression. That was all the guidance given. I am now lost. Would you please send me the right way? I have limited time to do this project and scant experience with statistics. Thank you. Nurse Kathy

Kathryn,

Excel provides a multiple regression tool which carries out the calculations for you. You can access this tool from the

Dataribbon. Click on theData Analysisbutton on the right side of the ribbon and you will be presented with a dialog box. Choose theRegressionoption. Details are found on the webpage http://www.real-statistics.com/multiple-regression/multiple-regression-analysis/. Details about how regression works are found on the webpage http://www.real-statistics.com/regression-models/.Charles

my boss said go see if you can learn PCA。 i nearly quit my job， not really used my brain in so many years。。。。 your coverage of PCA more than gives me hope， thx so much for all the free information on PCA and so much more。 i hope to be able to look at differences in equipment matching behavior using PCA。。 I’ll tell you later if its a success。

Kevin,

I am very happy to hear that my coverage of PCA has helped you. I look forward to hearing how everything works out.

Charles

Great work – I loved your power calculations in Excel example!

Thank you Daniel,

I am also planning to add more capabilities in the future.

Charles

thanks a lot for this! very much appreciated

So glad to see this! Is binomial regression on your TO-DO list? I’m working on bug-proneness prediction, and all the independent variables are count data …

Bob

Bob,

Some aspects of binary regression are already implemented. In particular binary logistic regression. See http://www.real-statistics.com/logistic-regression/. I plan to add additional capabilities in the future.

Charles

Thank you for the great excel add on!

Great tool!

After some minor adjustments it runs under a German Version of Excel 2013.

Thanks a lot

Manfred

Manfred,

This is great news. Until now I thought that the tools would only run in the English version of Excel. What did you need to do to make it work in German? This could be useful for those who want to use the software in French, Italian, Spanish, etc.

Charles

Sir,

i am a faculty member of Statistics Dept., Jagannath University, Dhaka. i am really happy to find the new program for solving logistic regression model as well as Survival analysis. But sir i try more than 3 times to install the resource pack but fail because require “pass word”

Please sir give me the instruction how can i install it in my excel page

regarding

R. Ahmmad

Lecturer, Dept of Statistics, Jagannath University, Dhaka, BD

Dear S. Ahmmad,

You should not need to supply a password in order to use the resource pack. The key is to have Excel recognize the software as an “add-in”. To do this you must perform the following steps:

1. Select

Office Button > Excel Options > Add-Insin Excel 2007 orFile > Help|Options > Add-Insin Excel 2010/2013, and click on theGobutton at the bottom of the window.2. Check the Realstats option on the dialog box that appears and click the OK button.

3. If this option doesn’t appear, click on

Browseto find and choose the realstats.xlam file. Then complete step 2 as described above.See http://www.real-statistics.com/free-download/real-statistics-resource-pack/ for more details.

Charles

PS I am about to release an upgrade which provides multinomial logistic regression.

Congratulations Charles. You have developed a great website with lucid explanations for people like me to learn statistics using Excel.

Don

An excellent tool that Dr. Zaiontz had made available. Dr. Zaiontz emailed me a description and examples of how to set the program up in Excel, which helped me enormously because Im not a statistician. I highly recommend the program.

Marc

Certified Wildlife Biologist

Thank you very much for your Excel add-in!!! It is very useful for me as I would like to do some statistical analysis in a very simple way…as it is with your excellent tool! Grazie!

Thank you Chiara. I am very pleased that you found it useful.

Charles

thank you for this excellent web page. I am peruvian, so if you come to mi country, Peru, I´ll be gld to show you the best places and act as your guide

Hi Charles,

Thank you very much for developing a great Excel add-in for statistics.

Much appreciated!

Best wishes to you, your wife and your family.

Sincerely,

Chunhao Yu Ph.D

Hi Charles,

Great that you built an excel-based stat tool. I’m not a statistician but I’m trying to apply probit model to some economic data. I see you have the logit in your package. But I don’t see the probit. How can I estimate the coefficients of a probit model with multiple independent variables in excel?

Grazie mille

Massimo

Hi Massimo,

I plan to implement probit shortly. I am now in the process of adding multivariate statistics capabilities (esp. MANOVA and Factor Analysis). Once this is released later this month, I will add probit.

Charles

hi all. iam MA student. how can i score this test?

Q3/ Have you ever heard or come across or used the following English idiomatic expressions? write Yes or No for each item , if yes translate them into Kurdish. (i divide the scores as follows. 0 for those who choose NO, 0.5 for those who say yes but cannot translate it correctly, and 1 for those who write yes and translate it correctly)

1- Raining cats and dogs

2- Zip your lip

3- Throw in a towel

4- Talk nineteen to the dozen

5- Green light

6- Penny pincher

7- Yes man

8- Month of Sundays

9- All mouth and trousers

10- Pitch dark (black)

11- Hard-and-fast

12- Poor as a church mouse

13- From time to time

14- Under the sun

15- On the cards

16- In a circle

17- Face-to-face

18- Fifty-fifty

19- Parrot-fashion

20- Many moons ago.

how can i find item facility, difficulty, and discrimination for this test. plz help

Sana,

I have now added a webpage on Item Analysis which describes how to calculate item difficulty and discrimination.

Charles

Hi Dr. Zaiontz,

Thank you for developing a great Excel add-in for statistics. Much appreciated!

Sincerely,

Dhaval Patel MD

Dr Patel,

Thank you very much for your kind remarks. Stay tuned. Next month I am planning to release a new version of the add-in which will include new data analysis tools and multivariate analyses.

Charles

Dear Dr.Zaintoz:

I just wanted to drop you a note to say how much I appreciate your outstanding website and the clarity of your examples. I certainly will recommend your website to my students. I only wish that you would have taken the trouble of putting together a book with a compendium of your examples.

Thanks very much for so unselfishly sharing your knowledge.

Dr.Raju

Dear Dr. Raju,

I am very pleased to read your comment.

I have written a couple of books based on the website and plan to publish the first of these in January.

Charles