**Theorem 1**: Let *x̄* and *ȳ* be the means of two samples of size *n _{x}* and

*n*

_{y}respectively. If

*x*and y are normal or

*n*and

_{x}*n*

_{y}are sufficiently large for the Central Limit Theorem to hold, then

*x̄*–

*ȳ*has normal distribution with mean

*μ*–

_{x}*μ*

_{y}and standard deviation

Proof: Since the samples are random, *x̄* and *ȳ* are normally and independently distributed. By the Central Limit Theorem and Property 1 and 2 of Basic Characteristics of the Normal Distribution, we know that *x̄* – *ȳ* is normally distributed with mean

and standard deviation

**Hypothesis Testing**: When the population is normal or the sample sizes are sufficiently large, we can use the above theorem to compare two population means. The theorem requires that the population standard deviations be known, which is usually not the case. Often, especially with large samples, the standard deviation of the samples can be used as an approximation for the population standard deviations. We can also employ the t-test (see Two Sample t-Test with Equal Variances and Two Sample t-Test with Unequal Variances) which doesn’t require that the variances be known, and is especially useful when the sample sizes are small.

**Excel Tools**: Excel provides a data analysis tool called **z-Test: Two Sample for Means** to automate the hypothesis testing process (as shown in Example 1).

**Example 1**: The average height of 5 year old boys in a certain country is known to be normally distributed with mean 95 cm and standard deviation 16 cm. A firm is selling a nutrient which it claims will significantly increase the height of children. In order to demonstrate its claim it selects a random sample of 60 four year old boys, half of whom are given the nutrient for one year and half of whom are not. Given that the heights of the boys at 5 years of age are as in the Figure 1, determine whether the nutrient is effective in increasing height.

**Figure 1 – Two sample test using z-scores**

In addition to the raw data, Figure 1 shows how to calculate the z-score for the difference between the sample means based on a normal population with a known standard deviation of 16 (i.e. a known variance of 16^{2} = 256). Here the null hypothesis H_{0} is

This is a two-tail test, which is why the p-value (in cell I12) is doubled. Since p-value = .008 < .05 = *α*, we reject the null hypothesis, and conclude there is a significant difference between the boys that take the nutrients and those that don’t.

We can also use Excel’s data analysis tool to automatically calculate the z-score from the sample data (although we must first reorganize the data in the form of either a single row or single column). Figure 2 shows the output of the data analysis tool for Example 1.

**Figure 2 – Output of z-Test: Two Sample for Means data analysis tool**

Looking at the two-tail results, we see once again that .008 < .05 (or alternatively |*z*| = 2.65 > 1.96 = *z*-crit), and so we reject the null hypothesis.

Hello,

Can I follow example 1 even if the sample sizes are different?

Basically I want to compare the mean of two samples with different sample sizes (in Excel). I have the mean, the variance and the sample size for both.

Thanks

Celina,

Yes, the sample sizes can be different.

Charles

Hi, I want to know how did you calculate population variance??

I used Excel’s VARP (or VAR.P) function. Please see the webpage Measures of Variability for details.

Charles

Can you please help me solve the ff question: In Norway, the distribution of birth weights for full-term infants whose gestational age is 40 weeks and whose mothers did not smoke during pregnancy is approximately normal with mean 3500 grams and standard deviation 430 grams (Bellinger et al., 1995; New England Journal of Medicine 332:549-555). An investigator plans to conduct a study to determine whether or not the mean birth weight of full-term babies whose mother smoked throughout pregnancy is different from that of the non-smoking mothers.

Suppose the investigator believes that the true mean birthweight for the infants from smoking mothers could be as low as 3200 grams or as high as 3800 grams (i.e. he anticipates conducting a two-tailed test) with the true variability being the same within each of the two groups. He intends to design a balanced CRD (i.e. equal sample sizes) in weighing babies from randomly selected mothers from each of the two groups.

a) Now, the investigator wants to risk a 10% or less chance of failing to detect a mean difference between the two groups of mothers. Suppose the investigator intends to eventually analyze the data assuming that the variance(s) are known. What sample sizes per each of the two groups would needed for this study?

b) Obviously, the investigator will not able to assume the variance(s) as known when he analyzes the data and intends to publish the results. Readdress the question in (a) given this more normal circumstance.

c) What power would be afforded from sample sizes of 10 babies per each of the two groups if a conventional t-test was going to be used to analyze the data?

d) What should be the sample sizes for the two groups if the investigator desires the 95% t-based CI on the mean difference to be no greater than 50 grams?

Isaac,

If I understand your question properly, the four questions you are asking can be resolved as follows:

a) What is the sample size required for a test using the normal distribution? See the following webpages:

Statistical Power and Sample Size

Power and Sample Size using Real Statistics

b) and d) What is the sample size required for a t test?

Sample Size Requirements for t test

c) What is the power of a t test?

Power for t test

You can also use the Real Statistics Statistical Power and Sample Size data analysis tool to answer these sorts of questions, as described on the following webpage:

Real Statistics Statistical Power Analysis Tool

Charle

Thank you very much Charle!

Hi Charles,

2 Questions:

Since the question is…….”Is the nutrient significantly better at INCREASING height?”…….wouldn’t that imply a 1-tailed test and hence more appropriate to use NORMSINV(.05) to arrive at the most appropriate Z-Crit value?

You used NORMSDIST in this example, but omitted using NORMDIST. I re-ran the results using NORMDIST(95, 106.69, 4.131182, TRUE) and got a different answer. My intuition is that NORMDIST is best for 1 sample testing and not two sample testing, which is why you only used NORMSDIST instead. Am I correct here or is there something I’m missing?

Jonathan,

Just because you are interested in increasing height does not mean that you should use a one-tailed test. You could use a one tailed test, if you are certain that the nutrient won’t decrease height. Usually it is safer to use the two-tailed test.

NORMDIST(x, m, s, TRUE) is equivalent to NORMSDIST((x-m)/m) and has nothing to do with 1 or 2 sample testing. For the problem on the webpage, the equivalent version of the p-value using NORMDIST instead of NORMSDIST is =2*NORMDIST(95.74-106.69,0,4.13,TRUE).

Charles

Charles, Thank you very much for the article. I am new to application of statistics. I am writing a document on methods of measuring the uniformity of the distribution of sequences created by random number generators and irrational number. The theorem is just what I have been looking for. Your website will referenced. Does the Theorem have a name? Many thanks

z-Test: Two Sample for Means.

See Citation for how to reference the website.

Charles

Charles,

Thank you for the reply. Suppose I wish to test if two dices have the mean and SD by taking large samples, the SD formula in the theorem increases it a factor of sqrt(2) compared with the single SD z test. The consequence is that the Theorem leads to a z-score lower by a factor of sqrt(2). Could you please comment on this. Many thanks in advance.

This is the way the mathematics works out, at least when the two samples have the same size and standard deviation. I guess one way to look at this is that with two samples you have added standard deviation from the mean (which in this case is the difference between two means).

Charles

I want to compare 2 means with standard ERRORS, I do not know n or standard Deviations

Rich,

Sorry, but I don’t understand the context of your question. Do you have two samples and want to compare the means of the samples? If not, please specify what information you have.

Charles

Hello Charles

I am collecting digital data over 256 channels. I access three separate types of data in each channel so I can produce a histogram of the number of counts for each type of data across the channels. Each type can have thousands of counts per channel. I can calculate the mean and standard deviation for each histogram. I am not interested in comparing the means of the three types in terms of their counts because the nature of the source ensures they will always be almost equal. But I am very interested in comparing the channels where the different means occur. Can a valid test be applied to compare channels?

Thank you,

Anthony

Anthony,

Sorry, but I don’t understand your description sufficiently well to respond to your question.

Charles

Sorry Charles. An image pixel is made up of red, green and blue. Consider red. Each pixel will contain some red. Scanning the image will count the total number of pixels and how many of them contain red. There are 256 colours in the image and the histogram will show how the red counts were distributed across each of the 256 colours (like throwing red balls into 256 bins). Same for green and blue. R, G and B distributions result and I wish to compare these distributions between samples. It is not practical to compare the means of the R, G and B data. But it is very useful, if possible, to compare the colour (bin) which holds the mean, particularly between samples.

Hope this helps.

Sorry Charles. I don’t blame you.

Here is the essence of it. Someone throws thousands of red balls into 256 numbered bins. A histogram is constructed showing the number of balls in each bin and central tendency is observed.

The exercise is repeated with the same number of balls being thrown into the same number of bins. Again, a histogram. The mean of each histogram can be calculated. Can a significant test be performed not on the means, but on the bins in which the means occurred.

It sounds like you want to compare the bins in some way. What sort of comparison do you want?

Charles

Hi Charles, and thanks. Yes. Think of two approx normal distributions sitting along side each other. The nature of the data and sampling requirements means that they have equal integrals and equal means; stdev’s differ. I have good data along the horizontal axis and it would be very useful if I could use it to measure the displacement between the distributions.

Anthony,

If they have equal integrals, then the cumulative distributions are the same, and so the standard deviation would be the same. Since you say that the stdev’s are different, then something is not quite right.

In any case, why can’t you simply subtract the stdev’s?

Charles

Hi Charles, and thanks again.

Yes, after thinking about your post you are correct on both counts. If the data were normal, the means would certainly cluster very closely around the pop mean and the difference between std’s would be a very useful measurement. Which raises your other point. The flaw is that this data just isn’t normal. I originally made that assumption on the basis that large samples should tend to be so. Not the case here. When I re-bin the data over 8, 16 and 32 bins there is no visual semblance of normality and any test applied confirms it. Please advise if you can see a flaw in this reasoning; otherwise I’ll test non parametrically. Thanks for your guidance.

Regards

Anthony