Tests of Hypotheses
Learning Objectives
- Understand the formal framework and key terminology of hypothesis testing.
- Differentiate between Type I and Type II errors and understand the concept of statistical power.
- Identify and perform common hypothesis tests for single and two population means.
- Perform hypothesis tests for single proportions and variances.
- Apply Chi-Square Goodness-of-Fit tests to check theoretical distributions.
- Explain the mathematical duality between confidence intervals and hypothesis tests.
Introduction to Hypothesis Testing
While estimation focuses on finding the value of a population parameter, hypothesis testing focuses on making decisions about a population parameter based on sample data. An engineer might ask: "Does this new steel alloy have a mean tensile strength greater than ?" or "Is the variance in asphalt thickness less than ?" Hypothesis testing provides a formal, objective framework to answer these yes-or-no questions.
The Framework of Hypothesis Testing
The formal steps required to set up and evaluate a statistical test involve defining the null and alternative hypotheses, calculating a test statistic, and making a decision based on a P-value or critical value.
1. Null Hypothesis ()
The statement of the status quo, no effect, or no difference. It always contains an equality sign (, , ). We assume is true until the sample data provides overwhelming evidence to the contrary.
Example: (The new alloy is no stronger than the old one).
2. Alternative Hypothesis ( or )
The statement we are trying to prove. It contradicts and never contains an equality sign (, , ). If the sample data strongly supports , we "reject ."
Example: (The new alloy is stronger).
3. Test Statistic
A standardized value calculated from the sample data (e.g., a Z-score, t-score, or value) assuming is true. It measures how far our sample result is from the null hypothesis value, expressed in units of standard error.
4. P-Value
The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample, assuming the null hypothesis is true.
Interpreting P-Values
- A very small P-value (typically ) indicates the observed data is highly unlikely under , leading us to reject .
- A large P-value indicates the data is consistent with , so we "fail to reject ."
5. Significance Level ()
The predetermined threshold for rejecting . It is the maximum allowable probability of making a Type I Error. Common values are (5%), (1%), or (10%).
Decision Rule
If the P-value , reject . If the P-value , fail to reject .
Errors in Decision Making and Statistical Power
Because we rely on partial information (a sample) to make decisions about a population, we are at risk of making mistakes. The risks inherent in statistical inference are classified into two types of errors.
Type I Error ()
Rejecting a true Null Hypothesis (a "false positive"). You conclude the new alloy is stronger when it actually isn't. The probability of a Type I error is precisely the significance level .
Type II Error ()
Failing to reject a false Null Hypothesis (a "false negative"). You conclude the new alloy is no better, but it actually is stronger.
Statistical Power ()
The probability of correctly rejecting a false Null Hypothesis. A highly powerful test is very likely to detect a real difference if one exists.
Factors Affecting Statistical Power
- Power increases as the true difference (effect size) increases.
- Power increases as the significance level increases (but this raises the risk of a Type I error).
- Power increases as the sample size increases. Engineers often calculate the minimum sample size needed to achieve a specific power (e.g., 80%) before running an expensive test.
Tests for a Single Population Mean ()
Testing claims about the center of a population. Depending on whether the population variance is known or unknown, different test statistics are used.
- Z-Test (Variance Known): Rarely used in practice. Assumes population variance is known.
- t-Test (Variance Unknown): The standard test. Uses the sample standard deviation .
Z-Test Statistic for a Single Mean
Used when the population variance is known.
- : Z-test statistic
- : Sample mean
- : Hypothesized population mean
- : Population standard deviation
- : Sample size
t-Test Statistic for a Single Mean
Used when the population variance is unknown, with degrees of freedom .
- : t-test statistic
- : Sample mean
- : Hypothesized population mean
- : Sample standard deviation
- : Sample size
Tests for Two Population Means ()
Comparing two different groups (e.g., compressive strength of Mix A vs. Mix B).
- Independent Samples (Pooled t-Test): Assumes the two populations have equal (but unknown) variances. The sample variances are pooled to estimate a single standard error.
- Independent Samples (Welch's t-Test): Does not assume equal variances. More robust and generally preferred.
- Paired t-Test (Dependent Samples): Used when observations are naturally paired or matched (e.g., measuring the stiffness of the exact same beam before and after a retrofitting procedure). The test is performed on the differences between paired values, treating them as a single sample.
Tests for a Single Proportion ()
Testing categorical outcomes (e.g., percentage of defective items). Uses the normal approximation (Z-test) if and .
Z-Test Statistic for a Single Proportion
Used to test a hypothesis about a population proportion.
- : Z-test statistic
- : Sample proportion
- : Hypothesized population proportion
- : Sample size
Tests for a Single Variance ()
Testing claims about the variability or consistency of a process. Uses the Chi-square () distribution. Highly sensitive to departures from normality in the population.
Chi-Square Test Statistic for a Single Variance
Used to test a hypothesis about a population variance, with degrees of freedom .
- : Chi-square test statistic
- : Sample size
- : Sample variance
- : Hypothesized population variance
Chi-Square Goodness-of-Fit Test
Used to determine whether a sample follows an expected probability distribution (e.g., "Is the arrival of cars at this intersection truly Poisson distributed?" or "Are these soil samples normally distributed?").
Goodness-of-Fit Tests
Checking if data follows a specific theoretical distribution. This is essential for validating the assumptions underlying other statistical methods.
- : The data follows the specified distribution.
- : The data does not follow the specified distribution.
- A large value means the observed data deviates significantly from what was expected, leading to rejection of .
Chi-Square Goodness-of-Fit Test Statistic
Calculates the deviation between observed and expected frequencies.
- : Chi-square test statistic
- : Observed frequency in the -th category
- : Expected frequency in the -th category under
The Connection Between Confidence Intervals and Hypothesis Tests
There is a direct, mathematical duality between Confidence Intervals and two-sided Hypothesis Tests. If a 95% CI for the mean is , then a two-sided hypothesis test (with ) will:
- Fail to reject (because is inside the interval).
- Reject (because is outside the interval).
Interactive Simulation
Interact with the simulation below to explore hypothesis testing concepts.
Engineering Data Analysis
Hypothesis Testing Simulator
P-Value and Significance Level Visualization
Visualize the relationships between the null distribution, critical value, significance level (), Type I/II errors, and p-value by adjusting the sliders in the simulation below.
Engineering Data Analysis • Topic 10
p-Value vs. Significance Level (α) Visualizer
Conclusion
Since the p-value (0.035) is significance level (0.050), the result is statistically significant.
- and : Formulate mutually exclusive hypotheses; contains equality.
- P-value: The probability of the sample data assuming is true. Small P-values (typically ) trigger rejection of .
- Type I Error (): False positive (rejecting true ).
- Type II Error (): False negative (failing to reject false ).
- Power (): The probability of correctly identifying a real effect. Highly dependent on sample size.
- Goodness-of-Fit (): Tests whether observed categorical data matches an expected distribution.
- Duality: A 95% Confidence Interval contains all values of the parameter that would not be rejected by a two-sided test at .