Statistical Hydrology

Learning Objectives

  • Understand the concepts of Return Period and Exceedance Probability for hydrologic events.
  • Calculate Risk and Reliability for hydraulic structures over their design life.
  • Perform Frequency Analysis using general frequency equations.
  • Apply probability distributions like Gumbel's, Log-Pearson Type III, and Log-Normal to historical data.
  • Utilize Plotting Positions, Confidence Limits, and L-Moments for robust statistical estimation.
  • Define and understand the application of the Probable Maximum Flood (PMF).

Applying probability theory to hydrologic events to predict return periods, risk, and frequencies.

Introduction

Hydrologic events (floods, droughts, storms) are stochastic (random) in nature. Statistical Hydrology uses probability theory to analyze historical data and predict the likelihood of future extreme events.

Return Period (TT)

Return Period (Recurrence Interval)

The average time interval between events equal to or exceeding a certain magnitude (xTx_T).

Return Period vs. Probability

T=1PT = \frac{1}{P}

Exceedance Probability (PP)

The probability that an event of magnitude x\ge x will occur in any given year. For example:

  • 100-year flood: T=100T = 100, so P=1/100=0.01P = 1/100 = 0.01 (1% chance of occurring in any single year).
  • 50-year flood: T=50T = 50, so P=0.02P = 0.02 (2% chance).

Risk (RR)

The probability that an event with return period TT will occur at least once in a project life of nn years.

Risk Equation

Calculates the probability that an event with return period T occurs at least once in n years.

R=1(1P)n=1(11T)nR = 1 - (1 - P)^n = 1 - (1 - \frac{1}{T})^n

Variables

SymbolDescriptionUnit
RRRisk of occurrencedimensionless
PPExceedance Probabilitydimensionless
nnProject life in yearsyears\text{years}
TTReturn periodyears\text{years}

Reliability

The probability that the event will not occur in nn years.

Reliability Equation

Calculates the probability that an event will not occur in a project life of n years.

Reliability=1R=(1P)n\text{Reliability} = 1 - R = (1 - P)^n

Variables

SymbolDescriptionUnit
Reliability\text{Reliability}Probability of non-occurrencedimensionless
RRRiskdimensionless
PPExceedance probabilitydimensionless
nnProject life in yearsyears\text{years}

Interactive Simulation

Use the simulation below to explore the relationship between return period, design life, and the calculated risk and reliability.

Frequency Analysis

Used to relate the magnitude of extreme events to their frequency of occurrence using probability distributions.

General Frequency Equation

Relates the magnitude of extreme events to their frequency of occurrence.

xT=xˉ+Kσx_T = \bar{x} + K \cdot \sigma

Variables

SymbolDescriptionUnit
xTx_TValue of variate with return period T (e.g., peak discharge)varies
xˉ\bar{x}Mean of the data seriesvaries
σ\sigmaStandard deviation of the data seriesvaries
KKFrequency factor, depends on the probability distribution and Tdimensionless

  1. Gumbel's Extreme Value Distribution (Type I)

Commonly used for flood frequency analysis.

Gumbel's Frequency Factor

Calculates the frequency factor for Gumbel's extreme value distribution.

K=yTyˉnSnK = \frac{y_T - \bar{y}_n}{S_n}

Variables

SymbolDescriptionUnit
KKFrequency factordimensionless
yTy_TReduced variate for return period Tdimensionless
yˉn\bar{y}_nReduced mean, dependent on sample size Ndimensionless
SnS_nReduced standard deviation, dependent on sample size Ndimensionless

Reduced Variate (yTy_T)

yT=ln[ln(11T)]y_T = -\ln [-\ln (1 - \frac{1}{T})]

  1. Log-Pearson Type III Distribution

The standard method for flood frequency analysis in the United States (USGS Bulletin 17B/17C). It applies the general frequency equation to the logarithms of the discharge values (y=logxy = \log x).

Log-Pearson III Equation

Applies general frequency equation to the logarithms of discharge values.

logxT=logx+Kzσlogx\log x_T = \overline{\log x} + K_z \cdot \sigma_{\log x}

Variables

SymbolDescriptionUnit
logxT\log x_TLogarithm of the variate value with return period Tdimensionless
logx\overline{\log x}Mean of the log-transformed datadimensionless
KzK_zFrequency factor, function of return period T and skewness coefficient C_sdimensionless
σlogx\sigma_{\log x}Standard deviation of the log-transformed datadimensionless

  1. Log-Normal Distribution

A special case of the Log-Pearson Type III distribution where the skewness coefficient of the logarithmic data is exactly zero (Cs=0C_s = 0).

Log-Normal Equation

A special case of Log-Pearson III where skewness is zero.

yT=yˉ+KzSyy_T = \bar{y} + K_z \cdot S_y

Variables

SymbolDescriptionUnit
yTy_TVariate value (\ln x) for return period Tdimensionless
yˉ\bar{y}Mean of the logarithmsdimensionless
KzK_zStandard normal deviate for return period Tdimensionless
SyS_yStandard deviation of the logarithmsdimensionless

Note: The actual variate value is then calculated as xT=eyTx_T = e^{y_T}.

Plotting Positions

To graphically plot a probability distribution from empirical data, the data points (e.g., annual peak floods) must be ranked in descending order (m=1m = 1 is the largest event). An empirical exceedance probability (PP) is then assigned to each rank using a plotting position formula.

Weibull Plotting Position

Assigns an empirical exceedance probability to ranked historical data.

P=mN+1P = \frac{m}{N + 1}

Variables

SymbolDescriptionUnit
PPEmpirical exceedance probabilitydimensionless
mmRank of the event in descending orderdimensionless
NNTotal number of years of recordyears\text{years}

The corresponding Return Period is calculated as T=(N+1)/mT = (N+1)/m. Other formulas include Gringorten and Cunnane.

Confidence Limits

Statistical estimates have inherent uncertainty because they are based on a finite sample of historical data. Confidence limits provide a range within which the true value is expected to lie with a specified probability (e.g., 95% confidence).

Standard Error

The standard error of estimate quantifies the uncertainty in the calculated magnitude xTx_T. The confidence interval is typically xT±zcSex_T \pm z_c S_e, where zcz_c is the standard normal variate for the desired confidence level, and SeS_e is the standard error.

L-Moments in Hydrology

Traditional product moments (mean, variance, skewness) are highly sensitive to outliers in small datasets, which is common in flood records. L-moments are an advanced statistical tool used to estimate distribution parameters more robustly.

Advantages of L-Moments

L-moments are linear combinations of probability weighted moments (PWMs). Because they are linear, they do not square or cube the data values, making them far less susceptible to the influence of extreme outliers compared to traditional variance or skewness. They provide more reliable parameter estimates for distributions like the Generalized Extreme Value (GEV) distribution.

Probable Maximum Flood (PMF)

Probable Maximum Flood (PMF)

The most severe flood considered physically possible in a particular drainage basin, based on comprehensive hydrometeorological analysis of maximum precipitation and hydrologic factors favorable for maximum runoff.

Unlike a 100-year or 500-year flood derived from statistical frequency analysis, the PMF is an absolute theoretical upper bound. It is generated by routing the Probable Maximum Precipitation (PMP) through the basin's hydrologic model, assuming worst-case antecedent soil moisture conditions and peak snowmelt (if applicable).

Design Application

The PMF is strictly used for designing the spillways of high-hazard dams, where structural failure would result in unacceptable loss of human life and catastrophic downstream damage. By designing for the PMF, engineers ensure the dam will never overtop under any foreseeable physical conditions, effectively eliminating the risk of hydrologic failure.

Risk and Reliability

When designing hydraulic structures, engineers must assess the probability that a design event will be exceeded over the lifetime of the structure.

Risk Equation

Calculates the risk for a design event over the lifetime of a structure.

R=1(1P)nR = 1 - (1 - P)^n

Variables

SymbolDescriptionUnit
RRRisk, probability that the event will occur at least once in n yearsdimensionless
PPProbability of occurrence in any single year (P = 1/T)dimensionless
nnDesign life of the structureyears\text{years}

Reliability

Reliability is the probability that the structure will not fail (i.e., the design event will not be exceeded) during its design life. It is simply 1R1 - R.

Key Takeaways
  • Hydrologic events cannot be predicted with absolute certainty due to their inherent randomness.
  • Statistical Hydrology applies probability theory to historical data to estimate the likelihood and magnitude of future extreme events (floods, droughts).
  • Return Period (TT) is the statistical average time interval between occurrences of an event of a specific magnitude.
  • It is the mathematical inverse of the Annual Exceedance Probability (PP): T=1/PT = 1/P.
  • Risk (RR) is the probability that an event will occur at least once during a project's design life (nn).
  • Even a 100-year flood has a 1% chance of occurring in any given year, meaning it could theoretically happen in consecutive years.
  • Frequency Analysis fits historical data to theoretical probability distributions to extrapolate extreme events beyond the recorded timeframe.
  • The General Frequency Equation (xT=xˉ+Kσx_T = \bar{x} + K \cdot \sigma) scales the mean by a frequency factor KK and standard deviation σ\sigma.
  • Gumbel's Extreme Value Type I is traditionally used for maximum annual flood series.
  • The Log-Pearson Type III distribution is the standard method mandated by US federal agencies for flood frequency analysis.
  • Plotting Positions like the Weibull Formula (P=m/(N+1)P = m/(N+1)) assign empirical probabilities to ranked historical data for graphical comparison against theoretical distributions.
  • Statistical estimates are uncertain because they rely on finite historical sample sizes.
  • Confidence Limits define a bound (e.g., 95%) within which the true magnitude of an event is expected to lie.
  • The width of the confidence interval depends on the Standard Error (SeS_e), which decreases as the length of the historical data record increases.
  • The Probable Maximum Flood (PMF) is the absolute physical upper limit of flooding for a basin, derived deterministically from the PMP, rather than statistically.
  • High-hazard dam spillways are designed to safely pass the PMF to ensure zero risk of catastrophic overtopping.