Discrete Distributions

Statistics
Author

Shreya Anbu

Published

January 2, 2027

Bernoulli distribution

Bernoulli Distribution is a type of discrete probability distribution where every experiment conducted asks a question that can be answered only in yes or no. In other words, the random variable can be 1 with a probability p or it can be 0 with a probability (1 - p). Such an experiment is called a Bernoulli trial. A pass or fail exam can be modeled by a Bernoulli Distribution.

Suppose there is an experiment where you flip a coin that is fair. If the outcome of the flip is heads then you will win. This means that the probability of getting heads is p = 1/2. If X is the random variable following a Bernoulli Distribution, we get P(X = 1) = p = 1/2.

A binomial random variable, X, is also known as an indicator variable. This is because if an event results in success then X = 1 and if the outcome is a failure then X = 0. X can be written as X ∼Bernoulli (p), where p is the parameter. The formulas for Bernoulli distribution are given by the probability mass function (pmf) and the cumulative distribution function (CDF).

image.png
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
import ipywidgets as widgets
from ipywidgets import interact

p = 0.3
plt.figure(figsize=(5, 4))
x = np.array([0, 1])
pmf_vals = bernoulli.pmf(x, p)
plt.plot(x, pmf_vals, 'bo', ms=8, label='Bernoulli PMF')
plt.vlines(x, 0, pmf_vals, colors='b', lw=5, alpha=0.5)
plt.xticks([0, 1])
plt.xlabel("X")
plt.ylabel("PMF")
plt.title(f"Bernoulli PMF (p={p:.2f})")
plt.legend()
plt.grid()
plt.show()

Binomial Distribution

A discrete probability distribution that includes the number of trials n, probability of success and probability of failure is called as Binomial distribution.

The probability mass function of the Binomial distribution is given by:

image.png

The probability mass function above is defined in the “standardized” form. To shift distribution use the loc parameter. Specifically, binom.pmf(k, n, p, loc) is identically equivalent to binom.pmf(k - loc, n, p).

Binomial Distribution is used where we have only two possible outcomes. Let’s see some of the areas where Binomial Distribution can be used:

  • To find the number of male and female students in an institute.
  • To find the likeability of something in Yes or No.
  • To find defective or good products manufactured in a factory.
  • To find positive and negative reviews on a product.
  • Votes are collected in the form of 0 or 1.
import scipy.stats as stats
from scipy.stats import binom


n = 10
p = 0.4
x = np.arange(0, n + 1)
plt.figure(figsize=(7, 5))
plt.plot(x, stats.binom.pmf(x, n, p), 'bo', ms=8, label='binom pmf')
plt.vlines(x, 0, stats.binom.pmf(x, n, p), colors='b', lw=5, alpha=0.5)
plt.title(f'Binomial Distribution PMF (n={n}, p={p:.2f})')
plt.xlabel('Number of Successes')
plt.ylabel('Probability Mass')
plt.legend()
plt.show()

Poisson Distribution

A Poisson distribution is commonly seen in real life situations where events occur independently over a fixed time interval, such as: the number of phone calls received at a call center per hour, the number of customers arriving at a store in a given time frame, the number of defects in a manufactured product batch, or the number of car accidents at a particular intersection per day.

Whenever there is an average rate of an event happening, it can most probably be modelled by a poisson distribution. It will help us predict the probability of the same event happening at different frequencies.

from scipy.stats import poisson

mu = 1.2
x = np.arange(poisson.ppf(0.01, mu), poisson.ppf(0.99, mu))
plt.plot(x, poisson.pmf(x, mu), 'bo', ms=8, label='Poisson PMF')
plt.vlines(x, 0, poisson.pmf(x, mu), colors='b', lw=5, alpha=0.5)
plt.title(f'Poisson Distribution (mu={mu})')
plt.xlabel('x')
plt.ylabel('PMF')
plt.show()