Joint Probability Distributions

Learning Objectives

  • Understand joint probability mass and density functions for multiple random variables.
  • Calculate and interpret marginal and conditional probability distributions.
  • Determine whether two random variables are statistically independent.
  • Calculate and interpret covariance and the correlation coefficient.
  • Understand the properties and applications of the Bivariate Normal Distribution.

Joint probability distributions extend the concepts of single random variables to multiple variables, allowing engineers to analyze complex systems where multiple factors interact. This lesson covers joint, marginal, and conditional distributions, as well as measures of linear relationship like covariance and correlation.

Simultaneous Variables

In many engineering applications, we need to understand the relationship between two or more random variables simultaneously. For example, a structural engineer might study the joint distribution of wind speed (XX) and atmospheric pressure (YY) during a hurricane, or a transportation engineer might model the number of cars (XX) and trucks (YY) arriving at a toll booth.

Joint Probability Mass and Density Functions

Describing the simultaneous behavior of multiple random variables.

Joint Probability Mass Function (Discrete)

For two discrete random variables XX and YY, the joint probability mass function f(x,y)f(x,y) gives the probability that XX takes the specific value xx AND YY takes the specific value yy simultaneously.

Joint Probability Mass Function

Formula for calculating joint discrete probability.

f(x,y)=P(X=x,Y=y)f(x, y) = P(X = x, Y = y)

Variables

SymbolDescriptionUnit
f(x,y)f(x, y)Joint probability mass function-
X,YX, YDiscrete random variables-
x,yx, ySpecific values of the random variables-
PPProbability-

Joint PMF Conditions

Joint Probability Density Function (Continuous)

For two continuous random variables XX and YY, the joint probability density function f(x,y)f(x,y) represents the probability that (X,Y)(X, Y) falls within a specific two-dimensional region RR in the xyxy-plane. The probability is the volume under the surface f(x,y)f(x,y) over region RR.

Joint Probability Density Function

Formula for calculating joint continuous probability over a region R.

P((X,Y)R)=Rf(x,y)dxdyP((X, Y) \in R) = \iint_R f(x, y) \, dx \, dy

Variables

SymbolDescriptionUnit
P((X,Y)R)P((X, Y) \in R)Probability that X and Y fall in region R-
RRTwo-dimensional region in the xy-plane-
f(x,y)f(x, y)Joint probability density function-
dx,dydx, dyDifferentials for x and y-

Joint PDF Conditions

Marginal Distributions

Isolating the behavior of one variable from the joint distribution.

Isolating Variables

Sometimes we have the joint distribution of XX and YY, but we only care about the distribution of XX alone, regardless of YY. This is called the marginal distribution.

Marginal Probability Distributions

To find the marginal distribution of one variable, we sum (or integrate) out the other variable over its entire range. Similarly, h(y)h(y) is the marginal distribution for YY found by summing or integrating out xx.

Marginal Distribution (Discrete)

Formula for calculating marginal discrete probability.

g(x)=yf(x,y)g(x) = \sum_{y} f(x, y)

Variables

SymbolDescriptionUnit
g(x)g(x)Marginal distribution of X-
f(x,y)f(x, y)Joint probability mass function-
yyAll possible values of Y-

Marginal Distribution (Continuous)

Formula for calculating marginal continuous probability.

g(x)=f(x,y)dyg(x) = \int_{-\infty}^{\infty} f(x, y) \, dy

Variables

SymbolDescriptionUnit
g(x)g(x)Marginal distribution of X-
f(x,y)f(x, y)Joint probability density function-
dydyDifferential for y-

Conditional Distributions and Independence

How knowledge of one variable affects the probability distribution of another.

Conditional Probability Distribution

The probability distribution of XX, given that YY has taken a specific value yy. This is analogous to basic conditional probability (P(AB)=P(AB)/P(B)P(A|B) = P(A \cap B) / P(B)). Similarly, f(yx)=f(x,y)g(x)f(y|x) = \frac{f(x, y)}{g(x)} provided g(x)>0g(x) > 0.

Conditional Probability Distribution

Formula for conditional probability distribution.

f(xy)=f(x,y)h(y)provided h(y)>0f(x|y) = \frac{f(x, y)}{h(y)} \quad \text{provided } h(y) > 0

Variables

SymbolDescriptionUnit
f(xy)f(x|y)Conditional distribution of X given Y=y-
f(x,y)f(x, y)Joint probability function-
h(y)h(y)Marginal distribution of Y-

Independence of Random Variables

Two random variables XX and YY are independent if and only if their joint probability distribution is the product of their marginal distributions for all possible values of (x,y)(x, y). If this holds true, knowing the value of XX gives no information about the value of YY. (e.g., The compressive strength of concrete from Plant A vs. Plant B).

Independence Condition

Mathematical condition for independence of random variables.

f(x,y)=g(x)h(y)f(x, y) = g(x) \cdot h(y)

Variables

SymbolDescriptionUnit
f(x,y)f(x, y)Joint probability function-
g(x)g(x)Marginal distribution of X-
h(y)h(y)Marginal distribution of Y-

Covariance and Correlation

Measuring the linear relationship between two random variables.

Covariance (σxy\sigma_{xy})

A measure of how much two random variables change together. A positive covariance indicates that when XX is above its mean, YY also tends to be above its mean (e.g., traffic volume and noise levels on a highway). A negative covariance indicates an inverse relationship (e.g., the age of an asphalt pavement and its flexibility).

Covariance Formula

Formula for calculating the covariance between two random variables.

σxy=E[(XμX)(YμY)]=E[XY]μXμY\sigma_{xy} = E[(X - \mu_X)(Y - \mu_Y)] = E[XY] - \mu_X\mu_Y

Variables

SymbolDescriptionUnit
σxy\sigma_{xy}Covariance between X and Y-
EEExpected value operator-
X,YX, YRandom variables-
μX,μY\mu_X, \mu_YMeans of X and Y-

Properties of Covariance

Correlation Coefficient (ρxy\rho_{xy})

A standardized measure of the linear relationship between two variables. Covariance depends on the units of XX and YY, making it hard to interpret the strength of the relationship. The correlation coefficient scales covariance by the standard deviations of both variables, producing a dimensionless value between -1 and 1.

Correlation Coefficient Formula

Formula for calculating the correlation coefficient.

ρxy=σxyσxσy\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x \sigma_y}

Variables

SymbolDescriptionUnit
ρxy\rho_{xy}Correlation coefficient between X and Y-
σxy\sigma_{xy}Covariance between X and Y-
σx,σy\sigma_x, \sigma_yStandard deviations of X and Y-

Interpretation of Correlation

The Bivariate Normal Distribution

The foundational model for two correlated continuous variables.

Bivariate Normal Distribution

When two continuous random variables are individually normally distributed and correlated, their joint behavior is described by the bivariate normal distribution. Its PDF forms a 3-dimensional bell surface (a mound) whose orientation depends on the correlation ρ\rho.

Key Properties of Bivariate Normal Distribution

Interactive Simulation

Interact with the simulation below to visualize joint probability distributions and marginals.

Engineering Data Analysis

Discrete Joint Probability Explorer

Probability Distribution Table

XYX \setminus YY=1Y = 1Y=2Y = 2Y=3Y = 3g(x)g(x)
X=10X = 100.150
X=20X = 200.600
X=30X = 300.250
h(y)h(y)0.2000.5000.3001.000
Mean μX\mu_X21.00
Mean μY\mu_Y2.10
Covariance Cov(X,Y)\text{Cov}(X,Y)
2.9000

If Cov(X,Y)>0\text{Cov}(X,Y) > 0, X and Y tend to increase together. If Cov(X,Y)<0\text{Cov}(X,Y) < 0, they vary inversely. If 00, there is no linear relationship.

Probability Distribution Visualizer

Y=1Y=1
Y=2Y=2
Y=3Y=3
g(x)g(x)
X=10X=10
0.10
0.05
X=20X=20
0.10
0.40
0.10
X=30X=30
0.05
0.20
h(y)h(y)

Bubble size & opacity indicate the magnitude of the joint probability f(x,y)f(x,y).

Green bars represent the marginal distributions g(x)g(x) and h(y)h(y).

Interactive Simulation

Explore the contours, density surface, and marginal distributions of a Bivariate Normal distribution by adjusting standard deviations and the correlation coefficient.

Engineering Data Analysis • Topic 7

Bivariate Normal Distribution Contours

Std Dev X (σX\sigma_X)2.0
Std Dev Y (σY\sigma_Y)2.0
Correlation (ρ\rho)0.50
Covariance Matrix (Σ\mathbf{\Sigma})
Σ=[4.002.002.004.00]\mathbf{\Sigma} = \begin{bmatrix} 4.00 & 2.00 \\ 2.00 & 4.00 \end{bmatrix}
3-Sigma confidence boundary2-Sigma confidence boundary1-Sigma confidence boundary
Contours at 1σ, 2σ, and 3σ
Key Takeaways
  • Joint Distributions (f(x,y)f(x,y)): Describe the simultaneous behavior of two random variables.
  • Marginal Distributions (g(x),h(y)g(x), h(y)): Isolate one variable by summing or integrating out the other.
  • Conditional Distributions (f(xy)f(x|y)): The behavior of XX given a specific value of YY.
  • Independence: If XX and YY are independent, f(x,y)=g(x)h(y)f(x,y) = g(x) \cdot h(y).
  • Covariance and Correlation: Measure the linear relationship between variables. Correlation (ρ\rho) is standardized, always falling between -1 and 1.
  • Bivariate Normal: The standard 3D bell-shaped curve for two correlated continuous variables.