Excel provides a Sampling data analysis tool which can be used to create samples. The tool works by defining the population as an array in an Excel worksheet and then using the following input parameters to determine how you would like to carry out the sampling.
Input Range – Specify the range of data that contains the population of values you want to sample. Excel draws samples from the first column, then the second column, and so on.
Sampling Method – Select one of the following two sampling intervals:
- Periodic – In this case, you specify the Period n at which you want sampling to take place. The nth value in the input range and every nth value thereafter is copied to the output column. Sampling stops when the end of the input range is reached.
- Random – In this case, you specify the Random Number of Samples. This number of values is drawn from random positions in the input range. A value can be selected more than once. (i.e. sampling is with replacement).
Example 1: From a population of 10 women and 10 men as given in the table in Figure 1 on the left below, create a random sample of 6 people for Group 1 and a periodic sample consisting of every 3rd woman for Group 2.
Figure 1 – Creating random and periodic samples
You need to run the sampling data analysis tool twice, once to create Group 1 and again to create Group 2. For Group 1 you select all 20 population cells as the Input Range and Random as the Sampling Method with 6 for the Random Number of Samples. For Group 2 you select the 10 cells in the Women column as Input Range and Periodic with Period 3.
Observation: The Sampling data analysis tool has a number of limitations which unfortunately reduces its usefulness. These include:
- Only numeric data (including blank) can be used.
- If in the example above the number of women is not equal to the number of men any blank cells will simply be treated as data and can be chosen for inclusion in a sample.
- The Label option does not function properly and so should not be used
- Random sampling is with replacement. As you can see from the example, the number 2 is chosen twice in the Group 1 sample.
As a result it often better to use other approaches to create a sample. We now show how to create the Group 1 sample above without duplicates.
Example 2: Recreate Group 1 from Example 1 without allowing any duplicates.
We accomplish this by creating a worksheet as in Figure 2.
Figure 2 – Creating a random sample without replacement
Column A consists of the data elements in the population (as taken from Figure 1). Column B consists of random numbers between 0 and 1. These are generated using the Excel function RAND(). Simply enter =RAND() in cell B4 and then highlight the range B4:B23 and enter Ctrl-D. This will place the formula =RAND() in every cell in the range B4:B23.
Finally create column C by putting the following formula in cell C4 and then copying it down (using Ctrl-D as described above) for as many rows as you want items in the sample.
Observation: If we wanted to generate a sample of size 6 with replacement, we would use the following formula in cell C4 instead (column B would not be necessary):
Real Statistics Excel Functions: The Real Statistics Resource Pack provides the following useful array functions that allow you to avoid the complex syntax described above.
SHUFFLE(R1, s) = array of the same size and shape as R1 which shuffles the elements in range R1 (without replacement). The string s is used as a filler in case the output range has more cells than R1. This second argument is optional and defaults to the error value #N/A.
RANDOMIZE(R1, s) = array of the same size and shape as R1 which contains random elements from R1 (with replacement). The string s is used as a filler in case the output range has more cells than R1. This second argument is optional and defaults to the error value #N/A.