**Basic Concepts**

**Definition 1**: The **Poisson distribution** has a probability distribution function (pdf) given by

The parameter *μ* is often replaced by *λ*. A chart of the pdf of the Poisson distribution for *λ* = 3 is shown in Figure 1.

**Figure 1 – Poisson Distribution**

**Observation**: Some key statistical properties of the Poisson distribution are:

- Mean =
*µ* - Variance =
*µ* - Skewness = 1 /
- Kurtosis = 1/
*µ*

**Excel Function**: Excel provides the following function for the Poisson distribution:

**POISSON**(*x, μ,* *cum*) where *μ* = the mean of the distribution and *cum* takes the values TRUE and FALSE

POISSON(*x, μ*, FALSE) = probability density function value *f*(*x*) at the value *x* for the Poisson distribution with mean *μ*.

POISSON(*x, μ*, TRUE) = cumulative probability distribution function *F*(*x*) at the value *x* for the Poisson distribution with mean *μ*.

Excel 2010/2013/2016 provide the additional function **POISSON.DIST** which is equivalent to POISSON.

**Real Statistics Function**: Excel doesn’t provide a worksheet function for the inverse of the Poisson distribution. Instead you can use the following function provided by the Real Statistics Resource Pack.

**POISSON_INV**(*p, μ*) = smallest integer *x* such that POISSON(*x, μ*, TRUE) ≥ *p*

Note that the maximum value of *x* is 1,024,000,000. A value higher than this indicates an error.

**Poisson Process**

If the average number of occurrences of a particular event in an hour (or some other unit of time) is *μ* and the arrival times are random without any tendency to bunch up (i.e. the assumptions for what is called a **Poisson process**) then the probability of *x* events occurring in an hour is given by

**Example 1**: A large department store sells on average 100 MP3 players a week. Assuming that purchases are as described in the above observation, what is the probability that the store will have to turn away potential buyers before the end if they stock 120 players? How many MP3 players should the store stock in order to make sure that it has a 99% probability of being able to supply a week’s demand?

The probability that they will sell ≤ 120 MP3 players in a week is

POISSON(120, 100, TRUE) = 0.977331

Thus, the answer to the first problem is 1 – 0.977331 = 0.022669, or about 2.3%. We can answer the second question by using successive approximations until we arrive at the correct answer. E.g. we could try *x* = 130, which is higher than 120. The cumulative Poisson is 0.998293, which is too high. We then pick *x* = 125 (halfway between 120 and 130). This yields 0.993202, which is a little too high, and so we try 123. This yields 0.988756, which a little too low, and so we finally arrive at 124, which has cumulative Poisson distribution of 0.991226.

Users of Excel 2010/2013/2016 can arrive at the same answer (124) by using the Real Statistics formula =POISSON_INV(0.99,100).

**Confidence Intervals**

The 1–*α* confidence interval for the mean based on *x* events occurring (in a unit of time) is given by

For Excel 2007,* χ*^{2}_{p,df} = CHIINV(1−*p,df*).

**Example 2**: Suppose the number of radioactive particles that hits a screen per second follows a Poisson process and suppose that 5 hits occurred in one second, find the 95% confidence interval for the mean number of hits per second.

Figure 2 shows the confidence intervals for various values of *x* and α.

**Figure 2 – Confidence intervals for the Poisson mean**

The requested confidence interval is

1.623486 ≤ *μ* ≤ 11.66833

as calculated by the formulas in cells C9 and D9:

=CHISQ.INV(B9/2,2*A9)/2

=CHISQ.INV.RT(B9/2,2*(A9+1))/2

Note that =CHISQ.INV(*p*,0) for any value of *p*, and so we cannot use this formula to calculate the lower bound when *x* = 0 (cell C4). In any case, this value is zero.

**Relationship with Binomial and Normal Distributions**

**Theorem 1**: If the probability *p* of success of a single trial approaches 0 while the number of trials *n* approaches infinity and the value *μ = np* stays fixed, then the binomial distribution *B*(*n, p*) approaches the Poisson distribution with mean *μ*.

Click here for the proof of this theorem.

**Observation**: Based on Theorem 1 the Poisson distribution can be used to estimate the binomial distribution when *n* ≥ 50 and *p* ≤ .01, preferably with *np* ≤ 5.

**Example 3**: A company produces high precision bolts so that the probability of a defect is .05%. In a sample of 4,000 units what is the probability of having more than 3 defects?

We can solve this problem using the distribution *B*(4000, .0005)*,* namely the desired probability is

1 – BINOMDIST(3, 4000, .0005, TRUE) = 1 – 0.857169 = 0.142831

We can also use the Poisson approximation as follows:

*μ = np* = 4000(.0005) = 2

1 – POISSON(3, 2, TRUE) = 1 – 0.857123 = 0.142877

As you can see the approximation is quite accurate.

**Observation**: The Poisson distribution can be approximated by the normal distribution, as shown in the following theorem.

**Theorem 2**: For *n* sufficiently large (usually *n* ≥ 20), if *x* has a Poisson distribution with mean *μ*, then *x* ~ *N*(*μ, *)*.*

**Test for a Poisson Distribution**

The **index of dispersion** of a data set or distribution is the mean divided by the variance.

Since the mean and variance of a Poisson distribution are equal, data that conforms to a Poisson distribution must have an index of dispersion approximately equal to 1. This fact can be used to test whether a data set has a Poisson distribution, as described in Goodness of Fit.

In fact in Goodness of Fit, we also show how to use the chi-square goodness-of-fit test to determine whether a data set follows a Poisson distribution.

Hi Charles,

Please help me with this problem as i do not understand how the standard deviation affects the processing time.

An analyst in the statistics office of a government department in Wellington

requests for reports. On average the office receives 19 requests per (40 hour) week, with the arrival pattern resembling a Poisson process. Once working on a request the processing time averages 2.0 hours with a standard deviation of 1.0 hours. After the request is processed the report is automatically emailed to the requester.

(a) What is the utilisation of the analyst?

(b) What is the average time a request waits before processing begins (Tq)?

Are my answers correct for (a) 0.2375 and (b) 0.6557?

Avi,

How did you calculate your answers?

Charles

I came across an article on the number of accidents per # man hours worked.

The article did not detail how the adjusted accident was derived. Neither did it specify how the UCL, Avg and LCL was derived based on the adjusted accident.

Need help to determine adjusted accident, UCL, Avg and LCL.

# man hours Accidents Adjusted accident

287287 4 3.843

270828 8 8.228

298531 4 3.630

271726 4 4.149

280771 7 6.961

282547 9 8.905

226415 18 20.685

226410 2 2.917

253482 2 2.429

270376 4 4.176

283216 15 14.843

278348 5 5.018

296736 5 4.633

308166 19 17.731

298507 9 8.466

295858 5 4.651

307692 5 4.414

273973 4 4.104

293849 4 3.718

Analysis was performed using adjusted accident with UCL=16.794, Avg=7.000, LCL=1.206

Ed,

Without reading the article, I don’t know what is intended by these terms.

Charles

Charles,

Please follow the link below for details.

http://www.aptleadership.com/performance-improvement/software/data-analysis-software.html

Thanks.

Ed

Charles,

I still can’t figure out how the author “uses standard statistical theory to adjust the number of accidents based on the variation of the man-hours worked from month to month.”

Any help is much appreciated.

Thanks.

Ed

Sorry Ed, but I haven’t had time to look into to this further. Have you been able to figure out what the author did?

Charles

Hy Charles..

I have to make an assignment on “fitting a poison distribution to observed data”

So tell me how many numbres of data will be suitable. ???

I shall be

very thankful to you.

Zarish,

Sorry, but I don’t know how many numbers is suitable. In fact, first you need to define what “suitable” means.

Perhaps someone else has some insight into this issue.

Charles

Hi.

How can I calculate this:

A fire brigade A is on average called out 5 times a day. 1 call takes on 1hour for the brigade. a) What is the probability that the men can rest at least 2 hours between the calls. b) What is the probability that the brigade B has to be sent out because the brigade A is still on its duty.

The answes should be a) 0.535 and b) 0.188

I would be really glad if you could help me…

Sorry Anti, but this looks like a school assignment and I don’t want to answer such questions. I am happy to clarify concepts, but not provide such specific answers.

Charles

Sir, I have 12 data sets how to check distribution on these data sets?

To check for normal distribution, see Testing for Normality

Also see

Goodness of Fit

KS Test

Charles

Hi Charles,

I am doing a project on diagnostic errors in medicine. I want to show the number of errors per month over a 6 month period. Overall, for 200 events I have about 150 with errors and 50 without errors. How do I use poisson distribution in this case? I do know in each month how many errors occur. Do we need this data? I am using Excel, so what will be the formula?

Mayukh,

You know that the probability of an error is 3/4. Now you need to know how many events occur in a month, and from this it is easy to calculate the average number of errors per month. If however you want to know the probability that you will have say m errors in the month (instead of the average number), then you can use the approach described on the referenced webpage.

Charles

If a cumulative probability distribution function is the probability that a random variable will have a value less than or equal to X, what would be the function name for the opposite(he probability that a random variable will have a value grater than or equal to X, or even just greater than X)?

It depends on the context. For example, in survival analysis this is called the survival function.

Charles

Say a user can check out a maximum of 2000 licenses at a time. Multiple users can check out one or more licenses each hour for a product but not more than 2000. The product owner has log that shows data for each hour in the year (365 days). For each hour, the log contains the highest number (quantity) of licenses checked out in that hour. Using this log of every hour over the past year, how would you determine for the next year if 2000 licenses is enough based on this data using a Poisson distribution function in excel? What would be the average be? Note that in many cases, the licenses checked out one hour is the same licenses check out by the same person in a previous hour(the license continues to be check out by the same user over multiple hours or days). Do you believe this is a Poisson Distribution? Also, how would you calculate the mean?

Able,

Please explain the premise more clearly. If a user checks out 2,000 in one hour, can he/she check out another 2,000 licenses in the next hour? (i.e. are these like checking out books in a library?) If you have say 10 users can they each check out 2,000 licenses in an hour (for a total of 20,000 licenses) or are they restricted to 2,000 licenses in aggregate?

What criterion are you using for determining whether “2000 licenses is enough based on this data using a Poisson distribution function in excel?” Do you mean that (1) in no hour is more than 2,000 licenses checked out or (2) the probability of more than 2,000 licenses being checked out is zero or (3) something else?

Charles

Charles,

In response to you comments:

1)If a user checks out 2000 licences one hour, in the next hour either their will be 0 licenses checkout out or the same 2000 licenses that were checked out in the previous hour(s). It all depends if a users software job has ended or not and the user does not require any more licenses.

2)If you have 10 users they are they restricted to 2,000 licenses in aggregate per hour.

3)I am looking for probability of more than 2,000 licenses being checked out is zero. I am also looking for the lowest license count where the probability of more than x number licenses being in use is 0.

It appears that you are assuming an “arrival time” (i.e. checking out of licenses) that follows a Poisson distribution, but you haven’t said anything about the “service time” (i.e. for how long the license is checked out). If we assume, for example that the arrival time follows a Poisson distribution with mean m and a service time whose average is say n, then we have a typical queueing model.

Is this is what you are interested in?

Charles

Charles,

I do not have information regarding the service time, just the arrival time. Can Poisson or perhaps another statistical method be used to determine the probability of more than 2,000 licenses being checked out is zero with just the arrival time information?

Charles, May I ask you a question about Poisson function on excel ? what would be the best way to ask you?

You can ask as a Comment.

Charles

say i have a sign up rate of about 2 per day

and each signup lasts for exactly 30 days

what is the probability that I will have at least a hundred people signed up at once?

Assuming that you are looking for the probability that at least 100 people are signed up in any specific 30 day period, then the result should be =1-POISSON.DIST(99,60,TRUE), which has a value of 1.48E-06, a very small number.

Charles

Hi.May I ask a question?

When n (from 5 to 10 and 20)increases what happens on the probability distribution graph?(binomial, poisson and normal)

It really depends on what happens with the other parameters.

Binomial: If the other parameters in the BINOMDIST function are held constant then the cumulative distribution values decrease (e.g. compare BINOMDIST(4,n,.7,TRUE) for n = 5, 10, 20.

Poisson: If you assume that the mean of the distribution = np, then the cumulative distribution values decrease (e.g. compare POISSON(2,np,TRUE) where p = .5 for n = 5, 10, 20.

Normal: It really depends on how you are going to use n since NORMDIST doesn’t directly use n.

Charles

Hi Charles,

Many thanks for the explanation, i’m trying to find a way to apply this to sports betting using Excel, i’ve managed to locate a Poisson template and i’m wondering if there are any know ways of viewing football prediction results

I don’t know any way of applying this to sports betting, but perhaps someone else in the community can help.

Charles

Hi Charles,

I wonder if it would be correct to use poisson distribution to determine safety stock for a product when the product has the following demand pattern for the last 12 months:

3 580

0

0

1 135

1 000

363

2 175

72

620

92

228

373

The reason behind this pattern is that one of the customers is buying given product in batches.

Thank you in advance.

Alice,

I have just updated the Chi-square Goodness of Fit webpage with a test to determine whether data conforms to the Poisson distribution. This can be found at the bottom of the http://www.real-statistics.com/chi-square-and-f-distributions/goodness-of-fit/ webpage, and should be helpful in answering your question.

Charles

Excellent resource!

I have used it to sucessfully simulate a poisson process (counts of a radioactive material) and has helped me to make a worksheet to obtain detection tresholds and minimun detectable activity (they are dependant of sigma) for the contaminations sensors i use.

Keep the good work!

The average number of signals sent from a station is 3 per day which do not reach properly to the another station . Find the probability that the signals which are sent in a day but not reached properly is at least 3

Please help me to solve this problem ….thanks in advance

1-POISSON(2,3,TRUE) assuming a Poisson distribution.

Charles

Pingback: Phil 3.20.15 | viztales

http://www.real-statistics.com/binomial-and-related-distributions/poisson-distribution/

Typo: the setup for the question stipulates a 0.05% probability of occurrence and the calculated result agrees with that. However, the formula as shown below appears to use 0.06% [.0006] rather than 0.05%. Just a typo; the result is based on the .05% value.

1 – BINOMDIST(3, 4000, .0006, TRUE) = 1 – 0.857169 = 0.142831

Greg,

Yes, you are correct. Thanks for catching this typo. I have now corrected it on the referenced webpage.

Charles

Sir

Example 2 Question 1: “What is the probability that the store will run out of MP3 players in a week if they stock 120 players? ”

You wrote: “The probability that they will sell ≤ 120 MP3 players in a week is

POISSON(120, 100, TRUE) = 0.977331 ”

So the probability of sell <= 120 is 0.977331, so the demand exceeds the inventory stock which is 120 pics (ran out of stock) is 1-0.977331.

I think the probability the store ran out of stock is 1-0.977331.

Am I wrong?

Hi Colin,

You have a sharp eye. Thanks for pointing out the mistake. I have now made the appropriate correction to the webpage.

Charles