Sampling Distributions

Learning Objectives

  • Understand the concept of sampling distributions and their importance in statistics.
  • Explain the Central Limit Theorem and its application.
  • Differentiate between the t-distribution, Chi-square distribution, and F-distribution.
How sample statistics behave across multiple samples, the Central Limit Theorem, and the fundamental distributions used for inference (t, Chi-square, and F).

Introduction to Sampling Distributions

When engineers test a sample of materials (e.g., 5 concrete cylinders), the sample mean (xˉ\bar{x}) is just an estimate of the true population mean (μ\mu). If we took a different sample of 5 cylinders, we would get a slightly different xˉ\bar{x}.

Because the sample mean changes from sample to sample, the sample mean itself is a random variable and has its own probability distribution. This distribution is called a sampling distribution. Understanding sampling distributions is the bridge between probability theory and statistical inference.

Random Sampling Fundamentals

The foundational concept behind all statistical inference.

Simple Random Sample (SRS)

A sample of size nn drawn from a population such that every possible sample of size nn has an equal probability of being selected. When elements are drawn independently from a population, the resulting random variables X1,X2,,XnX_1, X_2, \dots, X_n are independent and identically distributed (i.i.d.).

The Distribution of the Sample Mean

How the mean of a sample relates to the mean of the population.

Expected Value of the Sample Mean

If you draw infinitely many random samples of size nn from a population with mean μ\mu, the average of all those sample means will exactly equal the population mean. In statistical terms, Xˉ\bar{X} is an unbiased estimator of μ\mu.

Expected Value of the Sample Mean

The mean of the sampling distribution of the sample mean is equal to the population mean.

μXˉ=μ\mu_{\bar{X}} = \mu

Variables

SymbolDescriptionUnit
μXˉ\mu_{\bar{X}}Mean of the sampling distribution-
μ\muPopulation mean-

Standard Error of the Mean

The standard deviation of the sampling distribution of Xˉ\bar{X}. It measures how much the sample mean is expected to vary from the true population mean. As the sample size (nn) increases, the standard error decreases, meaning our estimates become more precise.

Standard Error of the Mean

Formula for the standard deviation of the sampling distribution of the sample mean.

σXˉ=σn\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}

Variables

SymbolDescriptionUnit
σXˉ\sigma_{\bar{X}}Standard error of the mean-
σ\sigmaPopulation standard deviation-
nnSample size-

The Central Limit Theorem (CLT)

The most important theorem in applied statistics.

Shape of the Sampling Distribution

What shape does the sampling distribution of Xˉ\bar{X} take?

Central Limit Theorem

If you draw random samples of size nn from any population distribution (even one that is highly skewed or non-normal) with mean μ\mu and standard deviation σ\sigma, the sampling distribution of the sample mean (Xˉ\bar{X}) will approach a Normal Distribution as the sample size nn becomes large.

Central Limit Theorem Conditions

  • If the original population is already normal, the sampling distribution of Xˉ\bar{X} is exactly normal for any sample size.
  • If the original population is not normal, the sampling distribution becomes approximately normal when n30n \ge 30.

The Distribution of the Sample Variance

How the variability of a sample relates to the population variance.

Distribution of the Sample Variance

If S2S^2 is the variance of a random sample of size nn drawn from a normal population with variance σ2\sigma^2, the quantity relating the sample variance to the population variance follows a Chi-square (χ2\chi^2) distribution with n1n-1 degrees of freedom. This forms the basis for confidence intervals and tests concerning the population variance.

Distribution of the Sample Variance Statistic

Statistic that follows a Chi-square distribution.

χ2=(n1)S2σ2\chi^2 = \frac{(n-1)S^2}{\sigma^2}

Variables

SymbolDescriptionUnit
χ2\chi^2Chi-square statistic-
nnSample size-
S2S^2Sample variance-
σ2\sigma^2Population variance-

Distributions for Statistical Inference

When the population parameters are unknown, we rely on specific sampling distributions to make estimates and test hypotheses.

1. Student's t-Distribution

Used for inferences about the population mean (μ\mu) when the population standard deviation (σ\sigma) is UNKNOWN.

Estimating with Unknown Standard Deviation

In practice, if we don't know the true mean μ\mu, we almost certainly don't know the true standard deviation σ\sigma. Instead, we must estimate σ\sigma using the sample standard deviation (ss). This introduces extra uncertainty.

The t-Distribution

When standardizing a sample mean using ss instead of σ\sigma, the resulting variable follows a t-distribution, not a standard normal (ZZ) distribution.

t-Statistic

The standardisation formula when population standard deviation is unknown.

t=Xˉμs/nt = \frac{\bar{X} - \mu}{s / \sqrt{n}}

Variables

SymbolDescriptionUnit
ttt-statistic-
Xˉ\bar{X}Sample mean-
μ\muPopulation mean-
ssSample standard deviation-
nnSample size-

Properties of the t-Distribution

  • It is bell-shaped and symmetric around 0, like the Z-distribution.
  • It has heavier (fatter) tails than the Z-distribution, reflecting the added uncertainty of estimating σ\sigma with ss. This means a t-score must be larger than a Z-score to achieve the same level of confidence.
  • Its exact shape depends on the Degrees of Freedom (df=n1df = n - 1). As sample size (nn) increases, ss becomes a better estimate of σ\sigma, and the t-distribution approaches the standard normal distribution.

2. The Chi-Square (χ2\chi^2) Distribution

Used for inferences about the population variance (σ2\sigma^2) or standard deviation.

Inferences About Variability

Engineers are often just as concerned with variability as they are with the mean (e.g., ensuring consistent concrete strength). The sample variance (s2s^2) has its own sampling distribution.

The Chi-Square Distribution

If random samples of size nn are drawn from a normal population with variance σ2\sigma^2, the statistic relating the sample variance s2s^2 to the population variance follows a Chi-square distribution with df=n1df = n - 1.

Chi-Square Statistic for Variance

Statistic comparing sample variance to population variance.

χ2=(n1)s2σ2\chi^2 = \frac{(n-1)s^2}{\sigma^2}

Variables

SymbolDescriptionUnit
χ2\chi^2Chi-square statistic-
nnSample size-
s2s^2Sample variance-
σ2\sigma^2Population variance-

Properties of the Chi-Square Distribution

  • Unlike the normal or t-distributions, the Chi-square distribution is strictly positive (because variance is squared) and is heavily right-skewed.
  • As degrees of freedom increase, it becomes more symmetric.

3. The F-Distribution

Used for comparing the variances of TWO different populations.

Comparing Two Variances

If an engineer wants to determine whether a new concrete mixing method produces more consistent results than the old method, they must compare two sample variances (s12s_1^2 and s22s_2^2).

The F-Distribution

If independent random samples are drawn from two normal populations with variances σ12\sigma_1^2 and σ22\sigma_2^2, the ratio of their sample variances scaled by population variances follows an F-distribution.

F-Statistic (General)

General formula for the F-statistic comparing two populations.

F=s12/σ12s22/σ22F = \frac{s_1^2 / \sigma_1^2}{s_2^2 / \sigma_2^2}

Variables

SymbolDescriptionUnit
FFF-statistic-
s12s_1^2Sample variance of population 1-
σ12\sigma_1^2Population variance of population 1-
s22s_2^2Sample variance of population 2-
σ22\sigma_2^2Population variance of population 2-

F-Statistic (Equal Variances)

Simplified formula when we hypothesize that the two population variances are equal (σ12=σ22\sigma_1^2 = \sigma_2^2).

F=s12s22F = \frac{s_1^2}{s_2^2}

Variables

SymbolDescriptionUnit
FFF-statistic-
s12s_1^2Sample variance of population 1-
s22s_2^2Sample variance of population 2-

Properties of the F-Distribution

  • The F-distribution is right-skewed and defined only for positive values.
  • It depends on two sets of degrees of freedom: the numerator (df1=n11df_1 = n_1 - 1) and the denominator (df2=n21df_2 = n_2 - 1).
  • It is the foundational distribution for Analysis of Variance (ANOVA).

Interactive Simulation

Interact with the simulation below to observe the Central Limit Theorem in action.

Engineering Data Analysis

Central Limit Theorem & Sampling Distribution

30

Number of random items in each sample.

Sampling Statistics

Total Samples:0
Mean of Means (μxˉ\mu_{\bar{x}}):0.000
Std Error (σxˉ\sigma_{\bar{x}}):0.000
Population Distribution
Loading chart...
Distribution of Sample Means

Generate samples to construct the sampling distribution.

Click "+1 Sample" or "Run Auto". As sample size n30n \ge 30 grows, the distribution of sample means approaches normality regardless of the population shape.

Interactive Simulation

Interact with the simulation below to compare the probability density functions of Student's t, Chi-squared, and F distributions under various degrees of freedom.

Engineering Data Analysis • Topic 8

Probability Distribution Shapes

Degrees of Freedom (ν\nu)3
Loading chart...

• Observe how as the degrees of freedom ν\nu increases, the tails of the t-distribution become lighter, and the curve converges directly to the Standard Normal distribution N(0,1)N(0, 1).

Key Takeaways
  • Sampling Distribution: The probability distribution of a statistic (like xˉ\bar{x} or s2s^2) across many samples.
  • Standard Error (σ/n\sigma/\sqrt{n}): Measures the variability of the sample mean. Larger samples yield smaller standard errors (more precision).
  • Central Limit Theorem: The sample mean becomes normally distributed as nn gets large (n30n \ge 30), regardless of the population's shape.
  • t-Distribution: Used for inferences about the mean (μ\mu) when the population standard deviation (σ\sigma) is unknown. Heavy-tailed.
  • Chi-Square Distribution: Used for inferences about a single population variance (σ2\sigma^2). Right-skewed.
  • F-Distribution: Used for comparing two population variances or conducting ANOVA. Right-skewed, requires two degrees of freedom.