One Sample Hypothesis Testing for Correlation

We now show how to perform hypothesis testing to determine whether the population correlation coefficient is statistically different from zero or some other value. Note that for normally distributed populations a correlation coefficient of zero is equivalent to the two samples being independent.

Topics:

29 Responses to One Sample Hypothesis Testing for Correlation

  1. Tom says:

    Thanks Charles.

  2. Tom says:

    Hi Charles,

    Thanks for the great site. If I have a small sample of data (n values of time series data, & don’t know if the data has a normal distribution or not) and the data sample has a serial correlation; r > 0. Since r > 0, the effective sample size, n’ will be less than n (n’ < n). Is there a 'rough & fast' way to estimate what n' is? Thanks for any info you can give.

    • Charles says:

      Tom,
      If you enter the phrase “effective sample size autocorrelation” in Google, the first 5 entries will all have some info about how to estimate the effective sample size. I plan to address this issue on the Real Statistics website shortly.
      Charles

  3. hannah says:

    Hello, im having problem in calculating sample size. I dont know to use what formula. im doing cross sectional study design..My hypothesis is there is an association between knowledge and mammography screening. My question is, based on my hyphotesis, what formula should i used in order to calculate sample size needed? TQ

    • Charles says:

      Example 6 of the referenced webpage show how to calculate the sample size. The specific calculations required are shown in Figure 7. You can also use the CORREL1_SIZE function or Statistical Power and Sample Size data analysis tool.
      Charles

  4. António Teixeira says:

    Hi Charles

    I am trying to develop a model that calculates what GPower statistical software calls exact power correlations tests.

    However it came to my mind that we could use the relation b=r*sigmay/sigmax to transform the correlation values in the test. If we think in terms of standardized variables we even have b=r.

    I have tried made several compairisons bewen using the t test for the slope and the Fisher approximation and I have doubts if the differences found are due to the fact that the Fisher transformation provides only an approximation.

    Is there any theoretical flaw in this line of thought?

    • Charles says:

      António,
      It is not surprising that using the t test to test the hypothesis that the correlation coefficient is zero is related to testing that the slope of the regression line is zero using the t test. I am not sure how Fisher’s approximation enters the picture, though, since this is useful when testing that the correlation coefficient is equal to some specific value which usually not zero. Perhaps I missed something.
      Charles

      • António Teixeira says:

        Hello
        My point was to see if the confidence inetrval for the standardized slope (Beta) coud be used as confidence interval for the correlation which is not the case. It even returns limits outside the interval [-1 ; 1].

        • Charles says:

          António,
          Thanks for sharing this observation. Even though it was a negative result, it was worth the attempt.
          Charles

  5. António Teixeira says:

    Hello

    In the text presenting Example 3 is r=.6 when it should be r=.7?

    Regards

    António Teixeira

    • Charles says:

      António,
      Yes, you are correct. I have changed the webpage to correct this typo. Thanks very much for catching this error.
      Charles

  6. Yan Zhang says:

    Hi Charles,
    I am interested in why for a bivariate normal distribution, its sample correlation coefficient r will have a standard error of [(1-r^2)/(n-2)]^2? Is there any derivation, or intuitive explanation for that?
    Thanks,
    Yan

  7. Matteo says:

    Hi Charles

    On another statistical website I’ve seen that in addition to the requirements on the data (binormal distribution), to make inferences there’s also a requirement also for the residuals to be normally distributed. Do you agree and, time permitting, could you elaborate on the subject?

    Thanks
    MAtteo

  8. Matteo says:

    Charles,

    A different question but still related to this post.
    Suppose someone published an informal paper or a summary showing some results from a study similar to your Example 1. However they do not include the actual data as you did with the table in Figure 1. All they show is the scatterplot, and they provide the correlation coefficient and the number of points used.
    If I could still assume the data was normally distrubuted (say I take their word for it), but did not have a way to calculate the descriptive statistics, can you still make inferences to probe their results, say estimate the confidence interval for their correlation coefficient at the 95% confidence level?

  9. Matteo says:

    Hi Charles,

    Excellent blog, very useful!

    Did you write (or are you planning to write) about deriving confidence interval for correlation coefficient in the case of multiple correlation?

    I’ll give you some two real examples from my discipline (geosciences) to explain why I am interested in it.
    Very often the porosity of a rock can be related to its acoustic impedance (the product of rock velocity and rock density measured in wells) and a correlation coefficient can be calculated. This is an example of linear correlation and the calculation of the confidence interval for the 95% confidence level is fairly straightforward (and now with your resource pack even easier).

    The second example is that of quality of crude oil, which can depend on age, depth, temperature at which it was formed (and possibly other variables). If a multi-linear correlation coefficient is calculated, how can the confidence interval for the 95% confidence level be estimated?

    Thank you,
    Matteo

    • Charles says:

      Matteo,
      Good point. The approach for create a confidence interval for multiple correlation is the same as that used on the referenced page. In any case, I will add an example on the Multiple Correlation webpage.
      Charles

  10. enoka gunasekera says:

    Sir, Is the pearson correlation is suit with the impact on job satisfaction through an incentive scheme of employees . how do i do it with standard deviation, mean value that are output of SPSS tool. please explain sir immediately. Enoka

    • Charles says:

      Enoka,
      I don’t have access to SPSS. If you want to perform the calculation in Excel then please look at the webpage http://www.real-statistics.com/. In any case you need to have the values for the means and standard deviations of the two samples is not sufficient to calculate the Pearson’s Correlation. You also need the sum of the pairwise products of the data elements in the two samples.
      Charles

  11. Tord says:

    We would need to test the null hypothesis that there is no correlation (H0: rho=0) between two variables x and y. In our case, however, neither of these variables are normally distributed but more like 1/exp (x). We have two data x-y samples, one in which x and y appears to be linearly correlated according to the function y=x (a strait line with 45 degrees slope, the calculated linear correlation is 0.99) and the other when the correlation function appears to be more very approximately like y=sqrt(x). In both cases we would like to test the null hypothesis of no correlation at all, i.e. derive the p-value for that there is no correlation at all between x and y. Can you please refer us to a computer code that would do this for the case when the variables that are not even approximately normally distributed.

  12. Colin says:

    Sir
    In Example 1, why do you use “r = CORREL(R1, R2) = -.713” instead of “CORREL(R1, R2) = n * COVAR(R1. R2) / (STDEV(R1) * STDEV(R2) * (n  – 1)) “

    • Charles says:

      Colin,
      The sample correlation coefficient and the population correlation coefficient are equal and in fact CORREL(R1, R2) = n * COVAR(R1. R2) / (STDEV(R1) * STDEV(R2) * (n – 1)), but it is easier to use the simple formula CORREL(R1, R2).
      Charles

      • Colin says:

        Sir
        Thank you sir. I thought sample correlation coefficient and the population correlation coefficient are different.

Leave a Reply

Your email address will not be published. Required fields are marked *