**Definition 1**: Suppose an experiment has the following characteristics:

- the experiment consists of
*n*independent trials, each with two mutually exclusive outcomes (**success**and**failure**) - for each trial the probability of success is
*p*(and so the probability of failure is 1 –*p*)

Each such trial is called a **Bernoulli trial**. Let *x* be the discrete random variable whose value is the number of successes in *n* trials. Then the probability distribution function for *x* is called the **binomial distribution**, *B*(*n, p*), and is defined as follows:

where *C*(*n, x*) = and *n*! = *n*(*n –*1)(

*n*2)⋯3∙2∙1 as described in Combinatorial Functions.

*–**C*(*n, x*) can be calculated by using the Excel function COMBIN(*n,x*). See Figure 2 of Built-in Excel Functions for more details about this function.

**Observation**: Figure 1 shows a graph of the probability density function for *B*(10, .25).

**Figure 1 Binomial distribution**

That the graph looks a lot like the normal distribution is not a coincidence (see Relationship between Binomial and Normal Distributions)

Click here for a proof of Property 1.

**Excel Function**: Excel provides the following functions regarding the binomial distribution:

**BINOMDIST**(*x, n, p*, cum) where *n* = the number of trials, *p* = the probability of success for each trial and cum takes the value TRUE or FALSE.

BINOMDIST(*x, n, p*, FALSE) = probability density function *f*(*x*) value at *x* for the binomial distribution *B*(*n, p*), i.e. the probability that there are *x* successes in *n* trials where the probability of success on any trial is *p*.

BINOMDIST(*x, n, p*, TRUE) = cumulative probability distribution *F*(*x*) value at *x* for the binomial distribution *B*(*n, p*), i.e. the probability that there are at most *x* successes in *n* trials where the probability of success on any trial is *p*.

**CRITBINOM**(*n, p*, 1 – *α*) = critical value; i.e. the minimum value of *x* such that *F*(*x*) ≥ 1 – *α*, where *F* is the cumulative binomial distribution function for *B*(*n, p*)

Excel 2010/2013 provide the following additional functions: **BINOM.DIST**, which is equivalent to BINOMDIST, and **BINOM.INV**, which is equivalent to CRITBINOM.

Excel 2013 introduces the following new function (where *x* ≤ y ≤ *n*):

**BINOM.DIST.RANGE**(*n, p, x, *y) = the probability there are between *x* and y successes (inclusive) in *n* trials where the probability of success on any trial is *p*

Thus, BINOM.DIST.RANGE(*n, p, x,* y) = BINOMDIST(y, *n, p*, TRUE) – BINOMDIST(*x–*1, *n*, *p*, TRUE) if* x* > 0 and BINOM.DIST.RANGE(*n, p,* 0, y) = BINOMDIST(y, *n, p*, TRUE). The y parameter may be omitted, in which case BINOM.DIST.RANGE(*n, p, x*) = BINOMDIST(*x, n, p*, FALSE).

**Example 1**: What is the probability that if you throw a die 10 times it will come up six 4 times?

We can model this problem using the binomial distribution *B*(10, 1/6) as follows

Alternatively the problem can be solved using the Excel function:

BINOMDIST(4, 10, 1/6, FALSE) = 0.054266

Can you explain me what difference between chi-square test and binomial distribution test.

If I using them to data (2 answer : Yes ,No)

In 2 alternate force choice

What be better?

Thank you so much

Each one is described on the website and can be used for different types of anayses. Which is better depends on the specific nature of the analysis you want to conduct. You have not supplied enough inofrmation for me to be able to tell you which is better for your analysis.

Charles

53% of American households subscribe to cable TV. You randomly select six households and ask each if they subscribe to cable TV. Construct a probability distribution for the random variable, x. Then draw the histogram and a pie chart showing the relative frequency distribution of the data using (use Microsoft excel) show all the working plizzzzzzzzzzzzzzzzzzzz. i need to u8nderstand it

Namara,

The following webpage gives examples similar to the one you have described.

Proportion Distribution

You can learn more about creating histograms at

Histograms

Charles

I have a problem where there are X amount people who are willing to buy a higher priced airline ticket, and 20% of those X will buy a lower priced ticket if there is one available. If I know how many low priced seats are available, can I use the binom.dist function to give me a random draw of how many of those seats will be taken by the higher paying customers? I’m running a simulation so I don’t want to just use 20% * number of low priced tickets. Thanks.

Jack,

I know that you can use the binomial distribution for similar type problems, but I don’t have enough information to judge in your case.

Charles

Charles,

I have a system which is specified to perform to a certain probability of success. It can only pass or fail thus a binomial probability. I need to design a series of tests with the minimum n to determine if the demonstrated results indicate that true performance is greater than or equal to the specified value. You can assume the tests are independent. The test results must have a specific confidence associated with it. I have used the following formula ”=1-BINOMDIST(# of failures, # of trials, 1-probability of success, cumulative) and more or less thru trial and error played around with the number of trials in order to achieve a probability which gets me the required confidence.

My question is three fold. 1. Is there a more elegant way to find n for given confidence levels? 2. If I believe that true performance is going to be less than the specified performance I would like to determine what my sample size should be to determine that there is no likelihood that it was going to approach the specified threshold. This way I can construct the test in such a way as to minimize the cost to stop the test if it is going to be determined that it is not going to result in meeting the threshold. What would I use in excel to do this? A negative binomial function? 3. Lastly how many runs (n) would I require in order to estimate what the true performance of the system is with confidence and how would that look in excel?

Jamey,

1. Before I can answer your question, what is n? Is it the smallest number of trials that yields a given number of successes x based on a constant probability of success on any single trial p? If so this can be found using the negative binomial distribution.

2. I don’t completely understand this question.

3. Is this n the same as the n in question 1? Also, I don’t know what you mean by performance.

One last point, when you need to iterate to a solution via trial and error (as you seem to have done), then you can often accomplish the same thing by using Excel’s Goal Seek or Solver capability.

Charles

can u answer this for me plis P(z<-1.5)

Does z have a standard normal distribution? If so, P(z < -1.5) = NORM.S.DIST(-1.5, TRUE) = .0668. Charles

P(z1.5) = 1-p(z<1.5)

1 – NORM.S.DIST(1.5, TRUE)

Assume that 96% of ticket holders show up for an airline flight. If a plane seats 200 people, how many tickets should be sold to make the chance of an overbooked flight (more ticket holders show up for a flight than there are seats available) is at most 10%.

Please help me here.

I think its 213 , is that appropriate?

David,

How did you arrive at the answer of 213?

Charles

By trial and Error method. Not sure if there is a different method to solve this case. Please advise.

David,

The probability that at most 200 seats will be filled from 200+x tickets sold is =BINOM.DIST(200,200+x,0.96,TRUE). This is the probability that the flight is not overbooked. Presumably you are looking for the largest value of x such that this probability is more than 90%.

If x = 5 this probability is 91.6%. If x = 6 this probability is 83.5%. Thus the answer is 205 seats.

Charles

Hi Charles

Your willingness to respond with solutions in various fields is admirable, and generous.

My problem is in trading derivatives, specifically to find the probability of experiencing at least 1 sequence of k consecutive failures occurring in N trades. The probability of a win is P and a loss (1-P), Bernoulli distribution appears to be applicable. I have listed by hand the 2^N combinations for N=5. The approach used is to identify from the table the combinations where for example the k =2 consecutive losses do not appear. Each combination has a probability of occurrence, for example P=0.6 and combination WLWWL probability =0.6*0.4*0.6*0.6*0.4. Sum all the probabilities of the combinations identified and subtract from 1.0 to find the answer. This approach was used to check the results found by Excel

+1.0-BINOMDIST(0,N-k+1,(1-P)^k, FALSE). The by hand results are around 20% lower than the Excel formula results. The real world problem would be to find answers for say N=60 and k=5, which would be totally impractical to do by hand (2^60 combinations!), so a formula approach is required.

Your comments would be appreciated.

Regards

Chris

Chris,

This sounds like the type of problem that I addressed on the following webpage:

http://www.real-statistics.com/binomial-and-related-distributions/runs/

Charles

Hi,

I have a problem as follows:

2. You run a local airline company. There is a ﬂight from city A to city B and the aircraft for this ﬂight has 30 seats for customers. One ticket can be sold for £1K. So, if the company sells 30 tickets, the revenue is £30K. There is only ﬁxed and sunk cost; the cost for this ﬂight is the same regardless of how many passengers there actually are. Also, statistics shows that each person who purchased a ticket independently shows up with probability 0.75. Therefore, there is possibility that the company can earn more money by overbooking tickets. For example, if the company sells 32 tickets, the sales will be £32K, and there won’t be any problem if no more than 30 purchasers actually show up. However, the company has a policy that any customer who cannot take the ﬂight due to ‘overﬂow’ should be given an alternative plane ticket as well as monetary compensation. The cost for this is £2K. That is, if the company sells 32 tickets and all the 32 purchasers actually turn up, two persons cannot take the ﬂight, so the net revenue will be £32K sales minus £4K compensation, that is, £28K. Answer the following questions.

(a) If the company sells 35 tickets, what are the company’s net revenues if exactly (i) 29, (ii) 31, or (iii) 33 people show up, respectively? (b) If the company sells 35 tickets, what is (i) the probability of no more than 30 people showing up, (ii) the probability of exactly 31 people showing up, and (iii) the probability of exactly 33 people showing up? (c) Compute and report the expected net revenue from selling 35 tickets. (d) What is the net-revenue-maximising number of tickets to sell? Report the number of tickets to sell and the achieved net revenue.

I have used your information (which was brilliant) to answer questions a) and b) but I was wondering if I can use the same maths here to answer c) and d) as I can’t figure out how to apply it to those questions.

Many thanks

Just to add to that, I have done part c) in excel using 35000 – bin.dist(35,35,0.75,false)*(10000)-bin.dist(34,35,0.75,false)… etc down to 31.

I’d like to do this to find out the expected values for when more than 35 tickets are sold as I think the revenue maximising tickets sold will be higher maybe around 40 or 41 but excel won’t allow me to use the same binomial formula as I did in part c) possibly because it can’t exceed 35 tickets.

Hi Danielle,

The answer for (c) is 35000 – bin.dist(35,35,0.75,false)*10000-bin.dist(34,35,0.75,false)*8000-bin.dist(33,35,0.75,false)*6000-bin.dist(32,35,0.75,false)*4000-bin.dist(31,35,0.75,false)*2000.

In general, if there are n extra tickets sold, then the net revenue is

3000 + n*1000 – bin.dist(n,n,0.75,false)*2000*n – bin.dist(n-1,n,0.75,false)*2000*(n-1) – bin.dist(n-2,n,0.75,false)*2000*(n-2) – … – bin.dist(31,n,0.75,false)*2000

You should be able to maximize this value by using Excel’s Solver. It may also be possible to simplify this expression mathematically and use calculus to find the maximum value.

Charles

Hi,

I’ve been measuring how long visitors to a zoo spend at exhibits and have a large data set of demographic information about the individuals and then various time data for each locations where they stopped to look at the animals. The time spent at the animals appears to follow a negative bionomial distribution so I want to analyse it using a generalized linear model comparing different types of visitor (age, gender etc). If I consider my data as counts of number of seconds someone stays at a location until they leave then the negative bionomial model seems to fit but I’m concerned that the probability of them leaving the location (probability of ‘failure’) is not a constant. The probability that they will leave presumably increases over time so is negative binomial the wrong regression in this instance?

Thanks

Duncan,

If I understand the scenario correctly, I believe that your model views “failure” in any specific unit of time as “remaining with the animals” and “success” as “leaving”. If the probability of leaving is constant then you you could use a negative binomial distribution. This is not possible if this probability is not constant. An example of a distribution where this probability is not constant is the hypergeometric distribution.

Charles

question.

binomial distribution

expected value is given = 4

n = 10

x = 7

cum = TRUE

what do I do to find probability using excel?

James,

The expected value for the binomial distribution is np. Since n = 10 and np = 4, it follows that p = .4. To find the cumulative probability of x, you use the formula =BINOM.DIST(x,n,p,cum), which in this case is =BINOM.DIST(7,10,.4,TRUE). If you are using a version of Excel prior to Excel 2010, then the formula is =BINOMDIST(7,10,.4,TRUE).

Charles

I have a binom.inv question.. Lets say

Q. 3 percent of parts created in a factory do not conform and must be reworked. John needs 100 good parts to ship. How many parts should he produce (considering the expected bad parts) to have a 95% chance of getting 100 good parts?

How to work this in binom.inv? Please help!

Misty,

Here are two ways of solving the problem, neither of which uses BINOM.INV.

Method 1. Consider “success” as getting a defective part. What is the probability of getting x defective parts in 100+x trials where the probability of success is 1-.97 = .03? The answer is BINOM.DIST(x,100+x,.03,TRUE). The values for x = 0 through 8 are

x Prob

0 0.047552508

1 0.190210032

2 0.40633618

3 0.626784852

4 0.79708145

5 0.903346528

6 0.959135694

7 0.984479915

8 0.994649283

You see that you need 6 “successes” to achieve more than 95% probability. This is equivalent to 106 trials

Method 2: Use the inverse of the negative binomial distribution. Excel provides the NEGBINOM.DIST function, but no NEGBINOM.INV function. Instead we use the Real Statistics NEGBINOM_INV inverse negative binomial function. The answer to the problem can be calculated via =NEGBINOM_INV(0.95,100,0.97), which has value 6. The answer once again is 106.

Charles

Hi,

i have a question, if i have a set of data:

Daily weight distribution bags :

2000 weight data per day

and i need to do a analysis on success weight and failure weight for a month. can i use binomial?

Hi,

Sorry, but I don’t understand your question.

Charles

Charles,

There is no “reply” link” at the end of your last answer of our previous exchange. From reading your bio, I understand that you are currently located in Italy. Thus what seems to be your unfamiliarity with Canadian Lotteries like Lotto-Max: 8 balls are drawn from the same device; the first 7s are the core combination of the draw, the 8th is a “bonus” ball. – You will find the details at this site: ” lotterycanada.com/ lotto-max ” .

My core question is about the # success in the binomial formula:

P(>=1) = 1-BINOM.DIST( s ; trials; p(s); true): For the # success: do I use “0” or “1” ?

Thank you and regards.

Pierre,

I now understand how the lottery works, but I still don’t understand what probability you are trying to calculate. Do you want to know the probability that you will win the lottery with 8 balls matching?

Charles

For each 49 ball of a next draw, considering the “history” of each balls:

With balls of similar “forecasted” values of P(>=1), I compose 8 balls combinations, that may win “something (?)”.

Voila !

Pierre,

Sorry, but I still don’t understand the underlying problem that you are looking to solve. When you speak about the “history” of a ball, I think that you want to factor in any defects in the ball which makes it come up more or less often (otherwise the history should be irrelevant). I don’t know what P(>=1) is referring to.

Charles

The probability for a # to be in a 8 ball combination in the next 10 draws of the this lottery

P(s:>=1) = 1-Binom.dist(s; 10; p(x); true).

I have tried both values for “s”; 0 and 1, with interesting outcomes that answers my questions.

Thank you for your time on my question. Your insights on this site are interesting and useful.

Regards.

Studying the historic draws of a lottery (Canada Lotto-Max), to assess the probability of “at least” 1 presence of a ball in n draws after its last known presence and “a” absences at the time of a d’th draw: this ball having a individual probability of P, may I use : 1-BINOM.DIST(0; n; P; True) ?

Or, I-NEGBINOM.DIST( “a”, 0, P, True) ?

Please advise. Thank you

Pierre,

Sorry, but I don’t completely understand the situation you are describing. Please describe the problem more clearly.

Charles

As of the last draw of Lotto-Max: a ball (x) has “n” presences, “a” absences; and a average probability of “p(x)”.

For exactly 1 presence in the next draw:

P(x) =NEGBINOM.DIST( “a”; 1 ; “p(x)”; false).

For at least 1 and more presence(s) in the next “d” draws: is it:

P(>=1) = 1-BINOM.DIST(0; d; P(x); true ) ? Where “0” = the “x-1” of a previous answer. . Thank you. Regards.

Pierre,

I don’t understand what “n” presences and “a” absences means. Is there one ball or many balls? I still don’t understand what the situation is.

Charles

Lotto-Max is a 7/49+1 lottery and all 8 balls are drawn from the same device.

As of a specific draw: a ball (x) has 47 presences; 4 absences since last drawn, and an average probability of p(x):

For the next 1st draw afterward:

P(x)=NEGBINOM.DIST(4; 1; p(x); false)

For the next d draws afterward:

P(>=1)=1-BINIOM.DIST(0 success; d trials; P(x); true) ?

Question is on: # success: 0 or 1 ?

Pierre,

We are slowly but surely breaking the code of your description. I learned from the website for Lotto Max that 7/49 means there are 7 numbers from 1 to 49. I assume that +1 means that there is an 8th number which yields a higher prize. Now please explain what it means for a ball to 47 presences and 4 absences.

Charles

I can’t figure out how to set up the BINOMDIST formula in Excel for P(>=X) [meaning greater than or equal to x]. I’ve worked on this problem for over 3 hours without success on this part. I already have working formulas for the remainder of the probabilities. Any help setting up the formula would be greatly appreciated.

Vicki,

Let’s do a specific example where the probability of success on each trial is .4 and there are 10 trials. The probability of getting 7 or fewer successes in the 10 trials is given by the formula =BINOMDIST(7,10,.4,TRUE). The probability of getting more than 7 successes in the 10 trials is therefore given by the formula =1-BINOMDIST(7,10,.4,TRUE). The probability of 8 or more successes is the same, namely =1-BINOMDIST(7,10,.4,TRUE).

In general, with n trials and the probability p of success on any one trial, the probability of x or more successes, what you called P(>=x), is given by the formula =1-BINOMDIST(x-1,n,p,TRUE).

Charles

CAN YOU PLEASE SHARE WITH ME THE REAL LIFE EXAMPLES WHERE BINOMIAL DISTRIBUTION WEN CANNOT APPLY?

IRFAN MALIK PAKISTAN

Irfan,

There are lots of examples throughout the website where the binomial distribution doesn’t apply. E.g. the binomial distribution is symmetric, and so for data which is very skewed the binomial distribution is not a good fit.

Charles

If the question is termed “What is the probability of at least 171 passengers showing up if the airline sold 184 tickets and the probability of passengers showing up is .93?”

I understand how to do it for a specific value, but how do I plug into excel for a question asking for “at least” a certain number of successes?

Clark,

BINOMDIST(171,184,.93,FALSE) = probability of that exactly 171 passengers show up

BINOMDIST(171,184,.93,TRUE) = probability of that at most 171 passengers show up

1-BINOMDIST(171,184,.93,TRUE) = probability of that more than 171 passengers show up = probability that at least 172 passengers show up

1-BINOMDIST(170,184,.93,TRUE) = probability that at least 171 passengers show up

Charles

Thanks Charles, That is what exactly I was looking for.

I am faced with the question, “Given three graphs with the same probability, what happens to the three graphs’ shapes as “n” increases?”

Annie,

Sorry, but I don’t understand what you mean by “Given three graphs with the same probability”. In any case, you can draw the three graphs as described on the website using Excel charting capabilities and see what happens.

Charles

Simple but at the same time very insightful.

Thanks for taking Your time to share the knowledge

i have a mean value of 30 participant, and standard deviation and other measure,i want to plot probability distribution function. is it possible?

Jai,

It depends on the distribution. As you can see from the formulas for the probability density function for the normal distribution if you have data for the mean and standard deviation you can plot the distribution. For the binomial distribution you need n and p (as shown in Figure 1 of the referenced page). But since mean = np and std dev = the square root of np(1-p) (by Property 1), you can solve for n and p and then plot the distribution. If I’ve done the algebra correctly, if m = the mean and s = the std dev, then p = 1-s^2/m and n = m^2/(m-s^2).

Charles

How do you recreate the Binomial Distribution Histogram that you have for B(n=20, p=.25) for other n and p values? Thank you so much!

Hi Joseph,

Place the number 1 in cell A1 and the formula =A1+1 in cell A2. Highlight the range A2 to An (where n is the value you specify) and press Ctrl-D. This will yield a column of values from 1 to n. Place the formula =BINOMDIST(A1,n,p,FALSE) in cell B1 (where n and p are the values you specify). Highlight the range B1 to Bn and press Ctrl-D. Now highlight the range A1:Bn and select Insert > Charts|Column.

Charles

Thanks a million for your sites. One correction required in Example1 , you may replace the outcome “head” with “six”.

Nishant,

Thanks for your kind remarks and thanks for catching the mistake in Example 1. I just made the change that you suggested.

Charles