Tests of Hypotheses

Tests of Hypotheses

Learning Objectives

Understand the formal framework and key terminology of hypothesis testing.
Differentiate between Type I and Type II errors and understand the concept of statistical power.
Identify and perform common hypothesis tests for single and two population means.
Perform hypothesis tests for single proportions and variances.
Apply Chi-Square Goodness-of-Fit tests to check theoretical distributions.
Explain the mathematical duality between confidence intervals and hypothesis tests.

Introduction to Hypothesis Testing

While estimation focuses on finding the value of a population parameter, hypothesis testing focuses on making decisions about a population parameter based on sample data. An engineer might ask: "Does this new steel alloy have a mean tensile strength greater than $400 \text{ MPa}$ ?" or "Is the variance in asphalt thickness less than $5 \text{ mm}^2$ ?" Hypothesis testing provides a formal, objective framework to answer these yes-or-no questions.

The Framework of Hypothesis Testing

The formal steps required to set up and evaluate a statistical test involve defining the null and alternative hypotheses, calculating a test statistic, and making a decision based on a P-value or critical value.

1. Null Hypothesis ( $H_0$ )

The statement of the status quo, no effect, or no difference. It always contains an equality sign ( $=$ , $\le$ , $\ge$ ). We assume $H_0$ is true until the sample data provides overwhelming evidence to the contrary.

Example: $H_0: \mu \le 400 \text{ MPa}$ (The new alloy is no stronger than the old one).

2. Alternative Hypothesis ( $H_1$ or $H_a$ )

The statement we are trying to prove. It contradicts $H_0$ and never contains an equality sign ( $<$ , $>$ , $\neq$ ). If the sample data strongly supports $H_1$ , we "reject $H_0$ ."

Example: $H_1: \mu > 400 \text{ MPa}$ (The new alloy is stronger).

3. Test Statistic

A standardized value calculated from the sample data (e.g., a Z-score, t-score, or $\chi^2$ value) assuming $H_0$ is true. It measures how far our sample result is from the null hypothesis value, expressed in units of standard error.

4. P-Value

The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample, assuming the null hypothesis is true.

Interpreting P-Values

A very small P-value (typically $\le 0.05$ ) indicates the observed data is highly unlikely under $H_0$ , leading us to reject $H_0$ .
A large P-value indicates the data is consistent with $H_0$ , so we "fail to reject $H_0$ ."

5. Significance Level ( $\alpha$ )

The predetermined threshold for rejecting $H_0$ . It is the maximum allowable probability of making a Type I Error. Common values are $0.05$ (5%), $0.01$ (1%), or $0.10$ (10%).

Decision Rule

If the P-value $\le \alpha$ , reject $H_0$ . If the P-value $> \alpha$ , fail to reject $H_0$ .

Errors in Decision Making and Statistical Power

Because we rely on partial information (a sample) to make decisions about a population, we are at risk of making mistakes. The risks inherent in statistical inference are classified into two types of errors.

Type I Error ( $\alpha$ )

Rejecting a true Null Hypothesis (a "false positive"). You conclude the new alloy is stronger when it actually isn't. The probability of a Type I error is precisely the significance level $\alpha$ .

Type II Error ( $\beta$ )

Failing to reject a false Null Hypothesis (a "false negative"). You conclude the new alloy is no better, but it actually is stronger.

Statistical Power ( $1 - \beta$ )

The probability of correctly rejecting a false Null Hypothesis. A highly powerful test is very likely to detect a real difference if one exists.

Factors Affecting Statistical Power

Power increases as the true difference (effect size) increases.
Power increases as the significance level $\alpha$ increases (but this raises the risk of a Type I error).
Power increases as the sample size $n$ increases. Engineers often calculate the minimum sample size needed to achieve a specific power (e.g., 80%) before running an expensive test.

Tests for a Single Population Mean ( $\mu$ )

Testing claims about the center of a population. Depending on whether the population variance is known or unknown, different test statistics are used.

Z-Test (Variance Known): Rarely used in practice. Assumes population variance $\sigma^2$ is known.
t-Test (Variance Unknown): The standard test. Uses the sample standard deviation $s$ .

Z-Test Statistic for a Single Mean

Used when the population variance is known.

Z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}

$Z$ : Z-test statistic
$\bar{x}$ : Sample mean
$\mu_0$ : Hypothesized population mean
$\sigma$ : Population standard deviation
$n$ : Sample size

t-Test Statistic for a Single Mean

Used when the population variance is unknown, with degrees of freedom $df = n - 1$ .

t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}

$t$ : t-test statistic
$\bar{x}$ : Sample mean
$\mu_0$ : Hypothesized population mean
$s$ : Sample standard deviation
$n$ : Sample size

Tests for Two Population Means ( $\mu_1 - \mu_2$ )

Comparing two different groups (e.g., compressive strength of Mix A vs. Mix B).

Independent Samples (Pooled t-Test): Assumes the two populations have equal (but unknown) variances. The sample variances are pooled to estimate a single standard error.
Independent Samples (Welch's t-Test): Does not assume equal variances. More robust and generally preferred.
Paired t-Test (Dependent Samples): Used when observations are naturally paired or matched (e.g., measuring the stiffness of the exact same beam before and after a retrofitting procedure). The test is performed on the differences between paired values, treating them as a single sample.

Tests for a Single Proportion ( $\pi$ )

Testing categorical outcomes (e.g., percentage of defective items). Uses the normal approximation (Z-test) if $n\pi_0 \ge 5$ and $n(1-\pi_0) \ge 5$ .

Z-Test Statistic for a Single Proportion

Used to test a hypothesis about a population proportion.

Z = \frac{p - \pi_0}{\sqrt{\pi_0(1-\pi_0)/n}}

$Z$ : Z-test statistic
$p$ : Sample proportion
$\pi_0$ : Hypothesized population proportion
$n$ : Sample size

Tests for a Single Variance ( $\sigma^2$ )

Testing claims about the variability or consistency of a process. Uses the Chi-square ( $\chi^2$ ) distribution. Highly sensitive to departures from normality in the population.

Chi-Square Test Statistic for a Single Variance

Used to test a hypothesis about a population variance, with degrees of freedom $df = n - 1$ .

\chi^2 = \frac{(n-1)s^2}{\sigma_0^2}

$\chi^2$ : Chi-square test statistic
$n$ : Sample size
$s^2$ : Sample variance
$\sigma_0^2$ : Hypothesized population variance

Chi-Square Goodness-of-Fit Test

Used to determine whether a sample follows an expected probability distribution (e.g., "Is the arrival of cars at this intersection truly Poisson distributed?" or "Are these soil samples normally distributed?").

Goodness-of-Fit Tests

Checking if data follows a specific theoretical distribution. This is essential for validating the assumptions underlying other statistical methods.

$H_0$ : The data follows the specified distribution.
$H_1$ : The data does not follow the specified distribution.
A large $\chi^2$ value means the observed data deviates significantly from what was expected, leading to rejection of $H_0$ .

Chi-Square Goodness-of-Fit Test Statistic

Calculates the deviation between observed and expected frequencies.

\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

$\chi^2$ : Chi-square test statistic
$O_i$ : Observed frequency in the $i$ -th category
$E_i$ : Expected frequency in the $i$ -th category under $H_0$

The Connection Between Confidence Intervals and Hypothesis Tests

There is a direct, mathematical duality between Confidence Intervals and two-sided Hypothesis Tests. If a 95% CI for the mean $\mu$ is $[390, 410]$ , then a two-sided hypothesis test (with $\alpha = 0.05$ ) will:

Fail to reject $H_0: \mu = 400$ (because $400$ is inside the interval).
Reject $H_0: \mu = 380$ (because $380$ is outside the interval).

Interactive Simulation

Interact with the simulation below to explore hypothesis testing concepts.

Engineering Data Analysis

Hypothesis Testing Simulator

Test Type

Significance Level (

\alpha

)

Test Statistic (Z)1.96

Conclusion

Fail to Reject H₀

The test statistic falls in the acceptance region. There is insufficient evidence to reject H₀.

P-Value and Significance Level Visualization

Visualize the relationships between the null distribution, critical value, significance level ( $\alpha$ ), Type I/II errors, and p-value by adjusting the sliders in the simulation below.

Engineering Data Analysis • Topic 10

p-Value vs. Significance Level (α) Visualizer

Hypothesis Direction

Significance Level (

\alpha

)0.050

p-Value0.035

Conclusion

Reject Null Hypothesis (H₀)

Since the p-value (0.035) is $\le$ significance level $\alpha$ (0.050), the result is statistically significant.

Key Takeaways

$H_0$ and $H_1$ : Formulate mutually exclusive hypotheses; $H_0$ contains equality.
P-value: The probability of the sample data assuming $H_0$ is true. Small P-values (typically $\le \alpha$ ) trigger rejection of $H_0$ .
Type I Error ( $\alpha$ ): False positive (rejecting true $H_0$ ).
Type II Error ( $\beta$ ): False negative (failing to reject false $H_0$ ).
Power ( $1-\beta$ ): The probability of correctly identifying a real effect. Highly dependent on sample size.
Goodness-of-Fit ( $\chi^2$ ): Tests whether observed categorical data matches an expected distribution.
Duality: A 95% Confidence Interval contains all values of the parameter that would not be rejected by a two-sided test at $\alpha = 0.05$ .

Previous TopicEstimation - Examples & Applications

Quiz Me

Next TopicTests of Hypotheses - Examples & Applications

Prev Next

Quiz Me

Learning Objectives

Introduction to Hypothesis Testing

The Framework of Hypothesis Testing

1. Null Hypothesis (H0H_0H0​)

2. Alternative Hypothesis (H1H_1H1​ or HaH_aHa​)

3. Test Statistic

4. P-Value

Interpreting P-Values

5. Significance Level (α\alphaα)

Decision Rule

Errors in Decision Making and Statistical Power

Type I Error (α\alphaα)

Type II Error (β\betaβ)

Statistical Power (1−β1 - \beta1−β)

Factors Affecting Statistical Power

Tests for a Single Population Mean (μ\muμ)

Z-Test Statistic for a Single Mean

t-Test Statistic for a Single Mean

Tests for Two Population Means (μ1−μ2\mu_1 - \mu_2μ1​−μ2​)

Tests for a Single Proportion (π\piπ)

Z-Test Statistic for a Single Proportion

Tests for a Single Variance (σ2\sigma^2σ2)

Chi-Square Test Statistic for a Single Variance

Chi-Square Goodness-of-Fit Test

Goodness-of-Fit Tests

Chi-Square Goodness-of-Fit Test Statistic

The Connection Between Confidence Intervals and Hypothesis Tests

Interactive Simulation

Engineering Data Analysis

P-Value and Significance Level Visualization

Engineering Data Analysis • Topic 10

Conclusion

1. Null Hypothesis ( $H_0$ )

2. Alternative Hypothesis ( $H_1$ or $H_a$ )

5. Significance Level ( $\alpha$ )

Type I Error ( $\alpha$ )

Type II Error ( $\beta$ )

Statistical Power ( $1 - \beta$ )

Tests for a Single Population Mean ( $\mu$ )

Tests for Two Population Means ( $\mu_1 - \mu_2$ )

Tests for a Single Proportion ( $\pi$ )

Tests for a Single Variance ( $\sigma^2$ )