Discrete Probability Distributions
Learning Objectives
- Understand the concepts of probability mass functions (PMF) and cumulative distribution functions (CDF).
- Calculate the expected value (mean) and variance of a discrete random variable.
- Apply the Binomial and Poisson distributions to model relevant civil engineering scenarios.
- Apply the Geometric and Negative Binomial distributions for scenarios involving trials to achieve successes.
- Use the Hypergeometric distribution to model probabilities when sampling without replacement from finite batches.
Discrete probability distributions are foundational for modeling scenarios where outcomes are countable, such as the number of structural defects, rainfall events, or equipment failures. This lesson covers key distributions including Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric, and how to apply them to solve practical engineering problems.
Introduction to Random Variables
When analyzing engineering data, we often deal with variables whose outcomes are determined by chance. A random variable is a numerical description of the outcome of an experiment. A discrete random variable can take on a countable number of distinct values (e.g., the number of potholes on a 10km stretch of road, the number of defective bricks in a pallet, or the number of extreme weather events in a year). Understanding these variables allows engineers to quantify risk and predict the frequency of specific events.
Probability Mass Functions and Mathematical Expectation
Probability Functions and Expectation
The foundational math behind discrete random variables.
Probability Mass Function (PMF), or
A function that assigns a probability to each possible value of a discrete random variable.
PMF Conditions
- for all .
- .
Cumulative Distribution Function (CDF),
The probability that the random variable will take a value less than or equal to .
Cumulative Distribution Function
Formula for calculating the CDF of a discrete random variable.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Cumulative Distribution Function | - | |
| Discrete random variable | - | |
| Specific value of the random variable | - | |
| Probability Mass Function evaluated at t | - |
Mathematical Expectation
Mathematical Expectation and Variance
The expected value represents the theoretical mean of the random variable.
Expected Value (Mean), or
The long-run average value of the random variable over infinitely many trials. It is the center of the probability distribution.
Expected Value (Mean)
Formula for calculating the expected value.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Expected value or mean | - | |
| Expected value of random variable X | - | |
| Specific value of the random variable | - | |
| Probability Mass Function evaluated at x | - |
Variance, or
A measure of the dispersion or spread of the probability distribution around the mean.
Variance
Formula for calculating the variance.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Variance | - | |
| Expected value operator | - | |
| Discrete random variable | - | |
| Expected value or mean | - | |
| Specific value of the random variable | - | |
| Probability Mass Function evaluated at x | - |
Computational Formula for Variance
An alternative, easier formula for calculating variance.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Variance | - | |
| Expected value of the square of X | - | |
| Expected value of X | - |
Common Discrete Distributions in Engineering
Common Discrete Distributions
Specific models used to describe common engineering scenarios.
The Binomial Distribution
Introduction to the Binomial Distribution
The Binomial Distribution models the number of successes in a fixed number of independent trials. In civil engineering, this could represent the number of concrete cylinders that pass a compressive strength test out of a batch of 10, or the number of days a construction project is delayed due to weather in a given month.
Binomial Distribution
A distribution that models the probability of exactly successes in independent trials.
Binomial Distribution Conditions
- There are a fixed number of trials ().
- Each trial has only two possible outcomes (Success or Failure).
- The probability of success () remains constant for each trial.
- The trials are mutually independent.
Binomial Distribution Formula
Formula to calculate the probability of exactly x successes in n trials.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Probability of exactly x successes | - | |
| Total number of trials | - | |
| Number of successes | - | |
| Probability of success on a single trial | - | |
| Probability of failure on a single trial | - |
Binomial Mean and Variance
Formulas for the mean and variance of a Binomial distribution.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Expected value or mean | - | |
| Variance | - | |
| Total number of trials | - | |
| Probability of success on a single trial | - |
The Poisson Distribution
Introduction to the Poisson Distribution
The Poisson Distribution models the number of events occurring in a fixed interval of time or space. This is highly applicable for modeling rare, independent events over continuous intervals, such as the number of heavy trucks passing a bridge toll per hour, or the number of micro-cracks along a 50-meter steel beam.
Poisson Distribution
Used for rare events where the exact number of trials is effectively infinite and is very small, but the average rate of occurrence () is known. Examples include the number of traffic accidents per month at a given intersection, or the number of flaws in a 100m reel of fiber optic cable.
Poisson Distribution Formula
Formula to calculate the probability of exactly x events occurring in a given interval.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Probability of exactly x events | - | |
| Average rate of occurrence in the interval | - | |
| Euler's number (approximately 2.71828) | - | |
| Number of events | - |
Poisson Mean and Variance
Formulas for the mean and variance of a Poisson distribution.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Expected value or mean | - | |
| Variance | - | |
| Average rate of occurrence in the interval | - |
Unique Property of Poisson Distribution
A unique property of the Poisson distribution is that its mean equals its variance.
The Geometric and Negative Binomial Distributions
Geometric and Negative Binomial Models
The Geometric and Negative Binomial distributions model the number of trials needed to achieve a specific number of successes. For instance, testing identical soil samples until the first one fails a compaction test (Geometric), or testing until exactly three samples fail (Negative Binomial).
Geometric Distribution
Models the number of independent trials needed to get the first success. (e.g., How many times must we test a newly designed joint until we observe the first failure, assuming a constant failure probability ?)
Geometric Distribution Formula
Formula for calculating the probability of getting the first success on the x-th trial.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Probability of first success on the x-th trial | - | |
| Probability of success on a single trial | - | |
| Trial number on which the first success occurs | - |
Geometric Mean and Variance
Formulas for calculating the mean and variance of a geometric distribution.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Expected value or mean | - | |
| Variance | - | |
| Probability of success on a single trial | - |
Negative Binomial Distribution
A generalization of the geometric distribution. It models the number of independent trials needed to get exactly successes.
Negative Binomial Distribution Formula
Formula for calculating the probability of getting the r-th success on the x-th trial.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Probability of r-th success on the x-th trial | - | |
| Trial number on which the r-th success occurs | - | |
| Total number of successes desired | - | |
| Probability of success on a single trial | - |
Negative Binomial Mean and Variance
Formulas for calculating the mean and variance of a negative binomial distribution.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Expected value or mean | - | |
| Variance | - | |
| Total number of successes desired | - | |
| Probability of success on a single trial | - |
The Hypergeometric Distribution
Sampling Without Replacement
The Hypergeometric Distribution models sampling without replacement from a finite population. This is critical in quality control where inspecting an item destroys it or removes it from the batch, changing the probability for subsequent selections.
Hypergeometric Distribution
Unlike the Binomial distribution where is constant (sampling with replacement), the Hypergeometric distribution is used when sampling without replacement from a finite population of size , containing exactly successes. (e.g., Selecting 5 concrete cylinders from a batch of 50, where 3 are known to be defective).
Hypergeometric Distribution Formula
Formula for calculating hypergeometric probabilities.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Probability of exactly x successes in the sample | - | |
| Total population size | - | |
| Total number of successes in the population | - | |
| Sample size | - | |
| Number of successes in the sample | - |
Hypergeometric Mean and Variance
Formulas for the mean and variance of a hypergeometric distribution.
Variables
| Symbol | Description | Unit |
|---|---|---|
| Expected value or mean | - | |
| Variance | - | |
| Total population size | - | |
| Total number of successes in the population | - | |
| Sample size | - |
Interactive Simulation
Interact with the simulation below to visualize various discrete probability distributions.
Engineering Data Analysis
Discrete Probability Distributions Explorer
Theoretical Properties
Interactive Simulation
Compare the geometric and hypergeometric distributions under different parameters to see how sampling without replacement alters success probabilities.
Engineering Data Analysis β’ Topic 5
Discrete Probability Distributions Sandbox
- Random Variables: Numerical values assigned to experimental outcomes.
- Expected Value (): The long-run average of a discrete distribution.
- Binomial: Used for independent trials with exactly two outcomes (success/failure) and constant probability .
- Poisson: Used for modeling the number of rare events occurring within a continuous interval (time, area, volume).
- Geometric/Negative Binomial: Focuses on the number of trials needed to achieve a specified number of successes.
- Hypergeometric: Used for finite populations when sampling without replacement (probability changes trial-to-trial).