Tests of Hypotheses

Learning Objectives

  • Understand the formal framework and key terminology of hypothesis testing.
  • Differentiate between Type I and Type II errors and understand the concept of statistical power.
  • Identify and perform common hypothesis tests for single and two population means.
  • Perform hypothesis tests for single proportions and variances.
  • Apply Chi-Square Goodness-of-Fit tests to check theoretical distributions.
  • Explain the mathematical duality between confidence intervals and hypothesis tests.

Introduction to Hypothesis Testing

While estimation focuses on finding the value of a population parameter, hypothesis testing focuses on making decisions about a population parameter based on sample data. An engineer might ask: "Does this new steel alloy have a mean tensile strength greater than 400 MPa400 \text{ MPa}?" or "Is the variance in asphalt thickness less than 5 mm25 \text{ mm}^2?" Hypothesis testing provides a formal, objective framework to answer these yes-or-no questions.

The Framework of Hypothesis Testing

The formal steps required to set up and evaluate a statistical test involve defining the null and alternative hypotheses, calculating a test statistic, and making a decision based on a P-value or critical value.

1. Null Hypothesis (H0H_0)

The statement of the status quo, no effect, or no difference. It always contains an equality sign (==, \le, \ge). We assume H0H_0 is true until the sample data provides overwhelming evidence to the contrary.

Example: H0:μ400 MPaH_0: \mu \le 400 \text{ MPa} (The new alloy is no stronger than the old one).

2. Alternative Hypothesis (H1H_1 or HaH_a)

The statement we are trying to prove. It contradicts H0H_0 and never contains an equality sign (<<, >>, \neq). If the sample data strongly supports H1H_1, we "reject H0H_0."

Example: H1:μ>400 MPaH_1: \mu > 400 \text{ MPa} (The new alloy is stronger).

3. Test Statistic

A standardized value calculated from the sample data (e.g., a Z-score, t-score, or χ2\chi^2 value) assuming H0H_0 is true. It measures how far our sample result is from the null hypothesis value, expressed in units of standard error.

4. P-Value

The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample, assuming the null hypothesis is true.

Interpreting P-Values

  • A very small P-value (typically 0.05\le 0.05) indicates the observed data is highly unlikely under H0H_0, leading us to reject H0H_0.
  • A large P-value indicates the data is consistent with H0H_0, so we "fail to reject H0H_0."

5. Significance Level (α\alpha)

The predetermined threshold for rejecting H0H_0. It is the maximum allowable probability of making a Type I Error. Common values are 0.050.05 (5%), 0.010.01 (1%), or 0.100.10 (10%).

Decision Rule

If the P-value α\le \alpha, reject H0H_0. If the P-value >α> \alpha, fail to reject H0H_0.

Errors in Decision Making and Statistical Power

Because we rely on partial information (a sample) to make decisions about a population, we are at risk of making mistakes. The risks inherent in statistical inference are classified into two types of errors.

Type I Error (α\alpha)

Rejecting a true Null Hypothesis (a "false positive"). You conclude the new alloy is stronger when it actually isn't. The probability of a Type I error is precisely the significance level α\alpha.

Type II Error (β\beta)

Failing to reject a false Null Hypothesis (a "false negative"). You conclude the new alloy is no better, but it actually is stronger.

Statistical Power (1β1 - \beta)

The probability of correctly rejecting a false Null Hypothesis. A highly powerful test is very likely to detect a real difference if one exists.

Factors Affecting Statistical Power

  • Power increases as the true difference (effect size) increases.
  • Power increases as the significance level α\alpha increases (but this raises the risk of a Type I error).
  • Power increases as the sample size nn increases. Engineers often calculate the minimum sample size needed to achieve a specific power (e.g., 80%) before running an expensive test.

Tests for a Single Population Mean (μ\mu)

Testing claims about the center of a population. Depending on whether the population variance is known or unknown, different test statistics are used.

  • Z-Test (Variance Known): Rarely used in practice. Assumes population variance σ2\sigma^2 is known.
  • t-Test (Variance Unknown): The standard test. Uses the sample standard deviation ss.

Z-Test Statistic for a Single Mean

Used when the population variance is known.

Z=xˉμ0σ/nZ = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}
  • ZZ: Z-test statistic
  • xˉ\bar{x}: Sample mean
  • μ0\mu_0: Hypothesized population mean
  • σ\sigma: Population standard deviation
  • nn: Sample size

t-Test Statistic for a Single Mean

Used when the population variance is unknown, with degrees of freedom df=n1df = n - 1.

t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}
  • tt: t-test statistic
  • xˉ\bar{x}: Sample mean
  • μ0\mu_0: Hypothesized population mean
  • ss: Sample standard deviation
  • nn: Sample size

Tests for Two Population Means (μ1μ2\mu_1 - \mu_2)

Comparing two different groups (e.g., compressive strength of Mix A vs. Mix B).

  • Independent Samples (Pooled t-Test): Assumes the two populations have equal (but unknown) variances. The sample variances are pooled to estimate a single standard error.
  • Independent Samples (Welch's t-Test): Does not assume equal variances. More robust and generally preferred.
  • Paired t-Test (Dependent Samples): Used when observations are naturally paired or matched (e.g., measuring the stiffness of the exact same beam before and after a retrofitting procedure). The test is performed on the differences between paired values, treating them as a single sample.

Tests for a Single Proportion (π\pi)

Testing categorical outcomes (e.g., percentage of defective items). Uses the normal approximation (Z-test) if nπ05n\pi_0 \ge 5 and n(1π0)5n(1-\pi_0) \ge 5.

Z-Test Statistic for a Single Proportion

Used to test a hypothesis about a population proportion.

Z=pπ0π0(1π0)/nZ = \frac{p - \pi_0}{\sqrt{\pi_0(1-\pi_0)/n}}
  • ZZ: Z-test statistic
  • pp: Sample proportion
  • π0\pi_0: Hypothesized population proportion
  • nn: Sample size

Tests for a Single Variance (σ2\sigma^2)

Testing claims about the variability or consistency of a process. Uses the Chi-square (χ2\chi^2) distribution. Highly sensitive to departures from normality in the population.

Chi-Square Test Statistic for a Single Variance

Used to test a hypothesis about a population variance, with degrees of freedom df=n1df = n - 1.

χ2=(n1)s2σ02\chi^2 = \frac{(n-1)s^2}{\sigma_0^2}
  • χ2\chi^2: Chi-square test statistic
  • nn: Sample size
  • s2s^2: Sample variance
  • σ02\sigma_0^2: Hypothesized population variance

Chi-Square Goodness-of-Fit Test

Used to determine whether a sample follows an expected probability distribution (e.g., "Is the arrival of cars at this intersection truly Poisson distributed?" or "Are these soil samples normally distributed?").

Goodness-of-Fit Tests

Checking if data follows a specific theoretical distribution. This is essential for validating the assumptions underlying other statistical methods.

  • H0H_0: The data follows the specified distribution.
  • H1H_1: The data does not follow the specified distribution.
  • A large χ2\chi^2 value means the observed data deviates significantly from what was expected, leading to rejection of H0H_0.

Chi-Square Goodness-of-Fit Test Statistic

Calculates the deviation between observed and expected frequencies.

χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
  • χ2\chi^2: Chi-square test statistic
  • OiO_i: Observed frequency in the ii-th category
  • EiE_i: Expected frequency in the ii-th category under H0H_0

The Connection Between Confidence Intervals and Hypothesis Tests

There is a direct, mathematical duality between Confidence Intervals and two-sided Hypothesis Tests. If a 95% CI for the mean μ\mu is [390,410][390, 410], then a two-sided hypothesis test (with α=0.05\alpha = 0.05) will:

  • Fail to reject H0:μ=400H_0: \mu = 400 (because 400400 is inside the interval).
  • Reject H0:μ=380H_0: \mu = 380 (because 380380 is outside the interval).

Interactive Simulation

Interact with the simulation below to explore hypothesis testing concepts.

Engineering Data Analysis

Hypothesis Testing Simulator

Test Statistic (Z)1.96
Conclusion
Fail to Reject H₀
The test statistic falls in the acceptance region. There is insufficient evidence to reject H₀.
Hypothesis Testing Standard Normal DistributionCritical Value -Z_α/2-Z_α/2Critical Value Z_α/2Z_α/2Test Statistic Z-score MarkerZ = 1.96Standard Normal Distribution

P-Value and Significance Level Visualization

Visualize the relationships between the null distribution, critical value, significance level (α\alpha), Type I/II errors, and p-value by adjusting the sliders in the simulation below.

Engineering Data Analysis • Topic 10

p-Value vs. Significance Level (α) Visualizer

Significance Level (α\alpha)0.050
p-Value0.035

Conclusion

Reject Null Hypothesis (H₀)

Since the p-value (0.035) is \le significance level α\alpha (0.050), the result is statistically significant.

Z_α/2-Z_α/2Z_statAlpha Region (Red) vs p-Value Area (Blue)
Key Takeaways
  • H0H_0 and H1H_1: Formulate mutually exclusive hypotheses; H0H_0 contains equality.
  • P-value: The probability of the sample data assuming H0H_0 is true. Small P-values (typically α\le \alpha) trigger rejection of H0H_0.
  • Type I Error (α\alpha): False positive (rejecting true H0H_0).
  • Type II Error (β\beta): False negative (failing to reject false H0H_0).
  • Power (1β1-\beta): The probability of correctly identifying a real effect. Highly dependent on sample size.
  • Goodness-of-Fit (χ2\chi^2): Tests whether observed categorical data matches an expected distribution.
  • Duality: A 95% Confidence Interval contains all values of the parameter that would not be rejected by a two-sided test at α=0.05\alpha = 0.05.