Often data is presented in the form of a frequency table. For example, the data in range A4:A11 of Figure 1 can be expressed by the frequency table in range C4:D7.

**Figure 1 – Frequency Table**

The table in Figure 1 shows that the data element 2 occurs 4 times, the element 4 occurs 2 times and the element 3 and 5 occur 1 time.

When data is provided in the form of a frequency table, the calculation of the mean and standard deviation cannot be performed directly using the usual AVERAGE and STDEV Excel functions. In fact for sample data {*x _{1}, …, x_{m}*} with corresponding frequency counts of

*f*respectively and sample size

_{1}, …, f_{m}*n*=

*f*, then the sample mean is (see Measures of Central Tendency):

_{1}+ f_{2}+ … + f_{m}This can be calculated in Excel as

=SUMPRODUCT(R1, R2) / SUM(R2)

where R1 is a range containing the data elements {*x _{1}, …, x_{m}*} and R2 is a range containing {

*f*}.

_{1}, …, f_{m}In a similar way, the sample variance can be calculated as

which can be calculated in Excel as

=SUMPRODUCT((R1-R3)^2, R2)/(SUM(R2)-1)

where R1 and R2 are as above and R3 contains the sample mean (as described above). Using Property 1 of Measures of Variability, we also have the following alternative approach to calculating the variance:

which can be calculated in Excel by the formula

=(SUMPRODUCT(R1^2,R2)-SUM(R2)*R3^2)/(SUM(R2)-1)

Using these formulas we can calculate the mean and variance of sample data expressed in the form of a frequency table. We demonstrate this in the following example.

**Example 1**: Calculate the mean and variance of the sample data from the frequency table in Figure 1.

**Figure 2 – Calculation of mean and standard deviation from frequency table**

The required calculation is displayed in Figure 2. Here cell F11 contains the formula =D11/E11, which calculates the mean, and cell G14 contains the formula =(D14-E14*F14)/(E14-1), which calculates the variance. The results are the same as calculating the mean and variance by applying Excel’s AVERAGE and VAR functions to the data set {2, 2, 2, 2, 3, 4, 4, 5}.

Note too that a frequency table is closely linked to a frequency function, as defined in Definition 1 of Discrete Distributions. E.g. since there are 8 elements in the data set in Figure 2, we see that the frequency function for the random variable *x* is as in Figure 3 where each frequency value is divided by 8:

**Figure 3 – Frequency function corresponding to frequency table**

Often frequency tables are used with a range of data values, i.e. with intervals for the *x* values. In this case the midpoint of each interval is assigned the value *x _{i}*.

**Example 2**: Calculate the mean and variance for the data in the frequency table in Figure 4.

**Figure 4 – Calculations for a frequency table with intervals**

The first interval in Figure 4 is 0 < *x* ≤ 4, the second 4 < *x* ≤ 10, etc. The calculation of the mean and variance is as in Figure 2, except that now the midpoints are used as the *x* values.

**Observation**: Sometimes the first and/or last interval is unbounded: e.g. if the last interval in Figure 4 is replaced by “over 20”. In this case it isn’t possible to establish a midpoint, and so all you can do is make your best estimate of a suitable representative value for that interval.

**Excel Function**: When you have a lot of data, it is convenient to put the data in bins, usually of equal size, and then create a graph of the number of data elements in each bin. Excel provides the **FREQUENCY**(R1, R2) array function for doing this, where R1 = the input array and R2 = the bin array.

To use the FREQUENCY array function, enter the data into the worksheet and then enter a bin array. The bin array defines the intervals that make up the bins. E.g., if the bin array = 10, 20, 30, then there are 4 bins, namely data with values *x* ≤ 10, data with values *x* where 10 < *x* ≤ 20, data with values *x* where 20 < *x* ≤ 30, and finally data with values *x* > 30. The FREQUENCY function simply returns an array consisting of the number of data elements in each of the bins.

**Example 3**: Create a frequency table for the 22 data elements in the range A4:B14 of Figure 5 based on the bin array D4:D7 (the text “over 20” in cell D8 is not part of the bin array).

**Figure 5 – Example of the FREQUENCY function**

To produce the output, highlight the range E4:E8 (i.e. a column range with one more cell than the number of bins) and enter the formula

=FREQUENCY(A4:B11,D4:D7)

Since this is an array formula, you must press **Ctrl-Shft-Enter**. Excel now inserts frequency values in the highlighted range E4:E8. Here E4 contains the number of data elements in the input range with value in the first bin (i.e. data elements whose value is ≤ 20). Similarly, E5 contains the number of data elements in the input range with value in the second bin (i.e. data elements whose value is > 20 and ≤ 40). The final output cell (E8) contains the number of data elements in the input range with value > the value of the final bin (i.e. > 80 for this example).

**Observation**: As described in Discrete Probability Distributions, the Real Statistics Resource Pack provides the **FREQTABLE** function. This function can also be used to create a frequency table with bins where the bins are equally spaced.

**Real Statistics Function**: The Real Statistics Resource Pack supplies the following supplemental array function to create a frequency table

**FREQTABLE**(R1, *bsize*. *bmax*) = an array function which produces the frequency table for the data in range R1, assuming equally sized bins of size *bsize* where *bmax* is the maximum bin size value

To use the function you must highlight an array with 3 columns and at least *k* rows where *k* = (*bmax* – MIN(R1)) / *bsize* + 1. You can highlight more rows than you need; any extra rows will take value #N/A.

The *bmax* argument is optional. If omitted then it defaults to *bmax* = MAX(R1). When bmax is not omitted then you should make sure that *bmax* ≥ MAX(R1): otherwise some data will be lost.

The *bsize* argument is also optional. If omitted then instead of creating a frequency table as described above, a table with a bin for each value in R1 is used. The *bmax* argument is ignored. This produces a table similar to that described in Example 3 of Discrete Distributions.

**Example 4**: Create a frequency table for the 22 data elements in the range A4:B14 of Figure 5 based on bins of size 15.

The desired frequency table can be produced using the array formula

=FREQTABLE(A4:B14,15)

as shown in range M4:O11 of Figure 6.

**Figure 6 – FREQTABLE function with bin size 15**

The headings are not outputted by the function but have been added manually. Note that two extra rows have been highlighted and so they are filled with #N/A.

**Observation**: You can also use the **Frequency Table** data analysis tool for creating frequency tables. See Histograms for an example of how to use these tools. Also see Frequency Table Conversion for how to calculate the descriptive statistics for the data described by a frequency table.

Thank you Prof. Charles.

Question: How can I use the Descriptive Statistics tool from the Data Analysis pack with frequencies instead of the raw data?

Example:

Age Count(f)

1 5

5 3

8 6

Daniela,

You can use the Real Statistics Frequency data analysis tool to do this. This is explained at

http://www.real-statistics.com/real-statistics-environment/data-conversion/frequency-table-conversion/

Charles

Hello,

When I use the histogram or frequency function, one of my values is being counted outside of the maximum value of my bin. The final bin goes from 6.36 to 6.64 and there are 3 values that should be counted here: 6.46, 6.64, and 6.64. However, in the ‘More’ row of the Data Analysis Histogram function there is a 1 and only a 2 in the 6.64 bin. This is also happening when I used the Frequency function, with 2 values being counted and the other one left out altogether. What am I missing here? How do I get the function to count all 3 of the values? Thanks!

Shawnda,

This probably has something to do with how these decimal numbers are being rounded off, but if you send me an Excel file with your data and results, I will try to figure out what is going on. You can find my email address at Contact Us.

Charles

Hey Charles,

Nice work. I keep visiting your page and updating the add-in for 2 years now. What should I say… luckily the days of Holt-Winters templates are over.

However, one thing I came across that I miss in Excel native functions or your FREQTABLE function is that I cannot (or do not know how to) make BINS that include the lower bound and exclude the upper bound.

This is relevant for me when I make an analysis of elapsed time/age/ difference between two time stamps. Consider the case of people and their birthdate listed. If you make BINS of 12 months like in Excel: ]0,12] (0 years old); ]12,24] (1 year old); ]24,36] (2 years old) etc. you’ll end up in trouble since the day on which one’s birthday is, makes a year complete and starts a new year. Humans actually add a year of life on their birthday, not the day after their birthday.

If a person is aged 24 months (s/)he’s actually in my 1 year old BIN which is wrong because he/she just has his/her birthday. The day after, (s/)he’ll be 24,x months old and (s/)he joins the 2 year old group only then. Since humans count their age from the birthday on, this kind of representation produces a very small amount of errors on population data.

Do you know a way how to solve this problem? Do you consider adding a feature on your next release? For now, I use R or Oracle BI to do this job. This can be much more time consuming than with Excel functions. Especially when I receive data in xls/csv format.

With much appreciation, Marc.

Marc,

The simplest way to resolve this issue based on Excel’s FREQUENCY function and Real Statistics’s FREQTABLE functions is to bins such as 11.9999999999, 23.9999999999, 35.9999999999, etc, instead of 12, 24, 36, etc.

I’ll add your suggested feature to my list of potential enhancements, but I can’t promise that it will be in the next release.

Charles

Hello Prof. Charles;

Thanks for all the tips.

Question: When using the native Frequency function (and when using routines that rely on Frequency function, such as creating Histograms), Excel rounds up every number that exceeds 50% of the range of the bin. For example, if I input bins 10, 20, 30…N, any numbers in my data matrix ranging from [10-15 will fall under bin 10, while numbers ranging from [15-25 will fall under bin 20, and so on. This is really a nuisance when trying to select the bins. Is there any way to stop Excel from rounding up the numbers before assorting them to the bin (i.e. force Excel to assort the number to the specified bin)? Thanks in advance.

Alexandre,

This is not how the FREQUENCY function works. If your bins are 10,20,30 and your data are 8, 12, 17, 20, 22, 28, then the frequency values for 10, 20, 30 are respectively 1, 3, 2. In particular, the frequency value for 20 (from 0,20,30) is the count of all data values larger than 10 and less than or equal to 20.

Charles

hEY,

How do you do the same thing when you have X and f(x)…it is not explained…

Frank,

Sorry, but I don’t understand your question.

Charles

Hi -Thanks very much for this statistics tool. I’m trying to make a frequency table on EXCEL for Mac released in late 2015. I got data from one column, but I can’t get the formula to work again. Do you know of a helpful resource for Excel on the Mac?

Theo,

There is a Mac version of the Real Statistics Resource Pack. See the webpage

Real Statistics for Mac

Charles

Hi Charles,

Before anything, I must congratulate you on your website. One of the best learning sites for statistics.

But with respect to the Frequency Tables page I recommend that you explain the computation of the mean and variance in terms of the first and second moments, e.g. E[X] and [EX^2]. This is what we have learned in college.

Mean = E[X] and var(X) = E[X^2] – (E[X])^2.

You sort of compute the first and second moments in figure 2 but it should be done on the probability table in figure 3. Any thoughts?

Wytek,

I have revised the referenced webpage to make things clearer. I have referenced Property 1 on another webpage which captures the moments idea.

For the probability table in Figure 3, I wanted to emphasize the idea of dividing by the sample size.

Charles

This is terrific! Thank you for sharing.

Question: What if I am calculating frequencies of nominal (categorical) data? Eg. frequencies of face-to-face contacts, phone call contacts, emails, and letters. I am preparing an electronic survey that contains a lot of these types of choices. In addition, there are open-ended questions whose responses will be put into categorical bins and displayed as frequencies. I ultimate question is this, “How do I determine a sample size for a survey with these kinds of questions? The population is 800. If you can give me any guidance or send me a reference, I would be most grateful. Thanks, Doug Clark

What sort of statistical analyses are you planning to perform?

Charles