**Real Statistics Data Analysis Tool**: The Real Statistics Resource Pack provides the **Correlation** data analysis tool. This tool calculates the Pearson’s, Spearman’s (rho) and Kendall’s (tau) correlation coefficients, as well as conducts various versions of a one-sample correlation test.

**Example 1**: Repeat Example 1 of Correlation Testing via the t Test (regarding Pearson’s correlation) using the **Correlation** data analysis tool.

To use this tool, press **Ctrl-m** and select **Correlation** from the menu of choices that appears. Fill in the dialog box that appears as shown in Figure 1 and press the **OK** button.

**Figure 1 – Correlation data analysis tool dialog box**

The output for the two-tailed test is shown on the right side of Figure 2 (starting at cell J3). The output contains the Pearson’s, Spearman’s and Kendall’s correlation coefficients (cells K5, K6 and K7). In addition, both the two-tailed t-test and normal test using the Fisher transformation are displayed where the hypothetical population Pearson’s correlation ρ = 0.

If you want to conduct a one-sample correlation test where ρ is equal to some value that is not zero, you would simply perform that same steps as described above and then change the value in cell N11 to this value. In this case you would need to ignore the results of the t test since they would not be accurate.

You can also change the value for alpha and/or the number of tails, and the output values will change automatically.

**Figure 2 – Output from Correlation data analysis tool (Pearson’s)**

**Example 2**: Repeat Example 3 of Spearman’s Correlation using the **Correlation** data analysis tool.

Press **Ctrl-m** and select **Correlation** as before, but when the dialog as in Figure 1 appears, select the **Spearman’s** option. The result is shown in Figure 3.

**Figure 3 – Output from Correlation data analysis tool (Spearman’s)**

We see from the figure that Spearman’s rho is -.67442 (cell N14). Since p-value = .005821 < .05 = α, we are pretty confident that rho is not zero. If you change the values in cells N11 and/or N12 the other values of rho, t-stat and p-value will change automatically.

**Example 3**: Repeat Example 1 of Kendall’s Tau using the **Correlation** data analysis tool.

Press **Ctrl-m** and select **Correlation** as before, but when the dialog as in Figure 1 appears, select the **Kendall’s** option. The result is shown in Figure 4.

**Figure 4 – Output from Correlation data analysis tool (Kendall’s)**

We see from the figure that Kendall’s tau is -.52381 (cell Y13) and that we are 95% confident that the actual value is in the interval (-.901, -.147). Since p-value = .006493, we are pretty confident that tau is not zero.

If you change the values in cells Y10 and/or Y11 the values of tau, s.e., etc. will change automatically.

**Observation**: The ties correction factor is used for the **Kendall’s Tau** option of the **Correlation** data analysis tool.

Great article Charles. I am working with a series of non-random numbers (stock price changes on a given day). On 30 random days, I have the returns on 10 different stocks. I would like to find out of A, these companies are correlated with each other, and B, which of these companies are the most correlated with each other.

There is logical data behind why they should be correlated, but I’m not sure that part is relevant to the explanation here. Thank you

Ross,

Assuming that “companies” and “stocks” refer to the same thing, I understand that you have data that can be organized into the range A1:J30 as follows: Each column contains the data for one of the 10 stocks and each row contains the data for one of the 30 days.

You can now create a correlation matrix using the Real Statistics formula =CORR(A1:J30). This will show all the pairwise correlations. The values that are the highest are the most correlated.

Charles

Perfect, thank you

Hi,

I need to compare correlations of testscores from different test-results of the same Test/differents Tests (same construct) from same/different populations.

I think I need to to standardize the results via Fishers’ Z? Is there any tool her?

Manfred,

It is not clear to me what specifically you are looking for, but perhaps the following webpage can help

http://www.real-statistics.com/correlation/two-sample-hypothesis-testing-correlation/

Also there is the FISHER function which may be useful.

Charles

Hi Charles,

thanks for the answer: my problem: I have to tests (intelligence, dyslexia,… ) with correlations to school-grades. The correlations are somewhat different. I want to know whether the differences are significant, so that I can prefer one test. Samples, time of norming etc are different (ages, gender, schoolform are mostly similar)

thanks for your help.

Manfred

Manfred,

You say that you “want to know whether the differences are significant.” In order for me to be able to help you, please give me an example of what sort of differences you are referring to.

Charles

Hi Charles,

the differences of Correlations between different tests and school grades

Thanks,

kind regard manfred

Sorry Manfred, but I don’t understand your comment.

Charles

Hey Charles wonderful job. But here is one problem that correlation is of 2 types; phenotypic and genotypic. What about calculation of genotypic correlation?

From a statistical point of view there is only “correlation”. I am not a biologist or geneticist and so don’t have specific knowledge about how this term might be used in those fields.

Charles

Hi Charles,

I am into random numbers testing for online gambling. I need to perform below tests:-

1. Serial Correlation Test : The Serial Correlation Test determines whether or not the numbers picked for the current draw in any way determines subsequent plays. We calculate a correlation coefficient r based on pairs of data taken from the same number position on consecutive draws.

2. Interplay Correlation Test : This test determines whether or not there is any relation between two positions of numbers chosen within the same draw. It is essentially the same analysis that is performed in the Serial Correlation Test, but it is performed on pairs of data taken from different number positions within the same draw rather than between draws.

I had figured out approach for serial correlation test, but not sure about calculating correlation coefficient for second test. Can you please help ? Below the example of number drawn:-

A B C D E

3 1 3 0 2

0 2 4 1 1

2 2 4 3 1

2 2 1 1 0

2 1 1 0 1

2 0 3 1 2

3 2 1 1 2

1 1 5 0 2

Sheetal,

Sorry, but I am not familiar with the interplay correlation.

Charles

No problem, Charles!

Hi, Sir.

Can I use correlation test (Pearson’s) to check the correlation between two dependent/response variables? Here’s a background of the study: I have 3 factors with 2 levels each then there are about 5 response variables measured after the study but I am interested to know the correlation between the 2 only. I used anova (three fixed factors using realstats addin) to check the significance of the 3 factors. How would I know if this significance means positive or negative? Also, regarding the 2 response variables obtained from the study, can I use correlation test to check correlation between them?

Ming

Ming,

You can use a correlation test to test the correlation between two variables.

Anova tests whether there is a significant difference between two or more levels in a factor or various interactions. I don’t know what you mean by positive or negative significance. Significance is not directional; there is only significance.

Charles

So I can use correlation test between the 2 response variables measured, right? Thanks. “Anova tests whether there is a significant difference between two or more levels in a factor or various interactions.” – Only for the levels in a factor? Can I also conclude from Anova that a factor has a significant effect to the response measured if for example, I get a “yes” in the ‘sig column’ referring to ‘A’ factor in the Anova table. Can I also conclude that there is an interacting effect between two factors if I get a “yes” for ‘AxB’ for the response measured? By positive, I mean if I increase the magnitude of the factor (in case of my study it is metal concentration), the response measured (metal uptake in the study) also increases. In negative, if I decrease the factor, the response also decreases. Does any input in the anova table say something about it? Thanks for the help!

Ming,

The answer to your first to questions is yes.

Regarding your third question, about positive/negative, what do you mean by increasing/decreasing the magnitude of the factor? Does this mean, for example, doubling (or halving) all the sample data values?

Charles

For example, the ‘A’ factor is metal concentration with two levels of concentrations 0 and 50, if I increase the concentration from 0 to 50, the response which is metal uptake by the plants also increases. That is if I increase the amount of metal in the water, the amount of metal taken by the plant also increases. Is that what it means if in the anova table it says ‘yes’ in the ‘sig column’ for ‘A’?

Ming,

I don’t completely understand your description, but it sounds like what you want is follow-up testing after ANOVA (contrasts, Tukey HSD, etc.).

Charles

Alright. Thanks, Sir Charles.

how did you find the values of cigg, life expectancy from the questionnaire

These are purely made-up numbers, used to illustrate the statistical test.

Charles

Apreciado Dr. Zaionits, reciba un cordial saludo, Dr excuseme una pregunta, si yo hago una tranformación de Box Cox Normal para mis datos, como calculo la inversa de la tranformación para interpretarlos?

Dear Dr. Zaionits, Yours sincerely, Dr Excuse me a question, if I make a transformation of Box Cox Normal for my data, as I do calculating the inverse of the transformation in order to interpret?

Hello,

I was wondering, if I have a survey and want to see the correlation between gender and political affiliation, can that be done with your instructions? I was thinking of coding the responses, (0=female, 1=male) and doing the same with question two. If I have this coded data in excel, can I run a correlation? Thanks.

Diana,

Yes, you can do that. keep in mind that since one of the variables is dichotomous (Male/Female), essentially your test is equivalent to testing whether the mean of question 2 for males is significantly different from that of females (i.f. a t test or Mann-Whitney).

Charles

hi sir

how to use correlation coffetent for discriptive data in order to assesing energy expenditur& nutiritional intake of college athletes?

pls help me i have n’t understand about thise

thank u!!

You can use a formula of the form CORREL(R1,R2) where R1 is a data range which contains the energy expenditure of the college athletes being studies and R2 contains the corresponding nutritional intake for these athletes.

Charles

Hi. I am doing the correlation between no of dengue cases and maximum temperature. Correlation is done by using Spearman, Kendall, and Pearson. My problem is, how to compare which method is best to used? I don’t know what performance evaluation tools should i use. Thank you.

Generally you should use the Pearson correlation coefficient. Only when you want to see whether there is a significant difference between this coefficient and some hypothetical value would you consider using one of these other coefficients. This would be the case if the test assumptions are not met (e.g. normality). These issues are discussed on the website.

Charles

sir

is there difference in the correlation value calculated from excel and the values calculated values from statsitistical software , in case in those data which have been applied formula in excel sheet

No differences that I know of. The correlation coefficient is pretty standard.

Charles

Hi,

I want to represent results of two workshops using correlation

i.e. correlation between time spent on writing user stories and accuracy/ decrease in errors related to that would be time spent in workshop 1 and 2,

Thanks

Vandana

Use the correlation coefficient, as described on this webpage.

Charles

Pingback: Factor Analysis Marketing Example - Marketing on Data

Pingback: Factor Analysis in Marketing - Marketing on Data

Hi! I want to ask you on what correlation data analysis tool I should use for my study on self-esteem’s relationship with academic performance. I used the Sorensen Self-Esteem Test to evaluate the respondents’ self-esteem and took their GPA’s from last year for the academic performanc. In the Sorensen Self-Esteem Test, you can get four results: Fairly Good, Mild Low, Moderately Low, and Severely Low. What statistical tool should I use to get the correlation coefficient? Thanks.

Jimmy,

You can simply use Excel’s CORREL function to calculate the correlation coefficient. The specific correlation data analysis tool to use really depends on what hypothesis you are trying to test.

Charles

Hello Charles,

If I am analyzing survey results which contain Likert scale answers, is it better to input them as:

strongly disagree = -2

disagree = -1

agree = 1

strongly agree = 2

OR

strongly disagree = 1

disagree = 2

agree = 3

strongly agree = 4

Also wondering something the same thing for a question that asks how long a call took, with answer choices being Less Time than Expected, About as Much Time as Expected, and More Time than Expected. Less=1, About=2, More=3 OR Less=-1, About=0, More=1?

Thank you!

The first scale introduces an implied zero rating, presumably meaning “neutral”. This sort of 5 point Likert scale of -2, -1, 0, 1, 2 should be equivalent to a 1,2,3,4,5 scale, but won’t be equivalent (at least mathematically) to the 1,2,3,4 scale that you have described.

Charles

So if the answer choices do not contain a neutral rating, I should use 1,2,3,4?

Yes

Pls I have data to analyze using spearman rank correlation but don’t know how to go about.

Pls help me as how to go about it. Thanks

Please look at the following webpage:

Spearman Rho

Spearman’s Correlation Hypothesis Testing

Charles

Dear Mr. Zaiontz

I have a question regarding the Kendall Tau function in the Correlation Data Analysis Tool.

Above you state the following: “Observation: The ties correction factor is used for the Kendall’s Tau option of the Correlation data analysis tool.”

Does this “ties correction factor” correspond to Kendalls Tau-b, Kendalls-Tau-c or is it something completely different?

Thanks in advance.

N.

Tau-b is tau-a adjusted for ties.

Charles

Hi. If i have two variables and data points on only one of them is available. However the correlation cofficient between the two variables is given. How do i calculate the data points of the second variable?

You can’t. There are an infinite number of possible values for the second variable that yield the same correlation coefficient.

Charles

i have a problem with converting my original survey results in to statistical analysis. i designed a questionnaire to determine relationship between service quality & promotional effort impact on brand equity.the like-rt scale define results between strongly agree( +2) to strongly disagree(-2).( eg; leasing company should maintain all necessary modern equipment – (agree 3 )which in like-rt scale is 0).how do i convert this results to correlation analysis data.

Amila,

It is not clear from your description what you are correlating with what. In any case, to use the tools described on the referenced webpage all you need to do is create two columns of data one for each of the two data sets that you are correlating.

Charles