Systems of Linear Equations - Theory & Concepts

Systems of Linear Equations

Learning Objectives

Understand the matrix representation of linear systems and basic solutions like Cramer's rule.
Perform direct methods such as Gauss Elimination, Gauss-Jordan, and LU Decomposition.
Implement pivoting strategies to mitigate division by zero and round-off errors.
Evaluate the condition of a matrix using vector and matrix norms, and understand ill-conditioning.
Solve systems utilizing SVD and the Thomas Algorithm.
Apply iterative refinement and iterative methods (Jacobi, Gauss-Seidel, SOR) for large sparse matrices.

System of Linear Equations

A collection of one or more linear equations involving the same set of variables. In structural engineering, this often appears as the stiffness matrix equation $[K]\{U\} = \{F\}$ .

Solving systems of simultaneous linear algebraic equations is essential in many fields of engineering. A general system can be represented in matrix form as $[A]\{x\} = \{B\}$ . For example, in structural analysis, determining displacements at various nodes requires solving large systems where $[A]$ is the global stiffness matrix, $\{x\}$ is the vector of unknown nodal displacements, and $\{B\}$ is the vector of applied forces.

Matrix Fundamentals and Cramer's Rule

Before employing numerical methods, it is crucial to understand basic matrix operations (addition, multiplication) and properties (determinant, trace). A system $[A]\{x\} = \{B\}$ has a unique solution if and only if the determinant of the coefficient matrix $[A]$ is non-zero ( $|A| \neq 0$ ).

Cramer's Rule

An explicit formula for the solution of a system of linear equations with as many equations as unknowns, valid whenever the system has a unique solution.

Cramer's Rule

A direct analytical method that expresses the solution in terms of determinants. While conceptually simple, Cramer's rule is computationally impractical for systems larger than $3 \times 3$ due to the factorial growth in computing determinants ( $O(n!)$ operations).

Cramer's Rule

Calculates the unknown $x_k$ using determinants.

x_k = \frac{|A_k|}{|A|}

Variables

Symbol	Description	Unit
$x_k$	The $k$ -th unknown variable	-
$\|A_k\|$	Determinant of matrix $[A]$ with the $k$ -th column replaced by vector $\{B\}$	-
$\|A\|$	Determinant of the original coefficient matrix $[A]$	-

Gauss Elimination

Gauss elimination is a direct method that involves two phases: forward elimination and back substitution.

Gauss Elimination Steps

STEP-BY-STEP

Forward Elimination: Manipulate the equations to eliminate unknowns and create an upper triangular system.
Back Substitution: Solve for the unknowns starting from the last equation and working backward to the first.

Computational Complexity (FLOPs)

To compare algorithms, we count the floating-point operations (FLOPs), primarily multiplications and divisions.

Forward Elimination: Requires approximately $\frac{2n^3}{3}$ FLOPs.
Back Substitution: Requires approximately $n^2$ FLOPs.
Total: Thus, Gauss Elimination is an $O(n^3)$ operation, specifically dominated by the $n^3/3$ multiplications. This cubic scaling means doubling the size of a system multiplies the execution time by eight.

Gauss Elimination Pitfalls

Gauss elimination is susceptible to division by zero and round-off errors. Pivoting (swapping rows so that the largest available element is used as the pivot) is often necessary to avoid these issues.

Interactive Simulation

Explore the steps of Forward Elimination and Back Substitution using the simulation below.

Gauss Elimination Process

Step-by-step visualization of solving a 3x3 system of linear equations using naive Gauss elimination.

Initial Augmented Matrix [A|B]

3.0000	-0.1000	-0.2000	7.8500
0.1000	7.0000	-0.3000	-19.3000
0.3000	-0.2000	10.0000	71.4000

Step 1 of 7

iCurrent Action

Identify pivot element a_11 (3).

Solution Vector x

x_1

x_2

x_3

Pivoting Strategies

Pivoting

The process of rearranging rows (partial pivoting) or both rows and columns (complete pivoting) of a matrix to place the largest possible absolute value on the diagonal before an elimination step.

In Gauss elimination, division by zero or a very small number can lead to severe round-off errors. Pivoting is a technique to rearrange the rows or columns of the coefficient matrix to place a larger absolute value in the diagonal pivot position.

Types of Pivoting

Partial Pivoting: The most common approach. Before eliminating variables in column $k$ , it searches the remaining rows $i \ge k$ for the largest absolute element $|a_{ik}|$ and swaps row $k$ with that row. This ensures the multiplier is always $\le 1$ , minimizing error growth.
Complete Pivoting: Searches both the remaining rows and columns for the largest absolute element to use as the pivot. This involves swapping both rows and columns. While numerically more stable, the extensive search and bookkeeping make it computationally expensive and rarely used compared to partial pivoting.

Gauss-Jordan

A variation of Gauss elimination where, when an unknown is eliminated, it is eliminated from all other equations rather than just the subsequent ones. The result is an identity matrix, making back substitution unnecessary. However, the procedure requires roughly $50\%$ more operations ( $n^3$ multiplications) than standard Gauss Elimination, making it less efficient for solving standard systems.

Matrix Inversion

The inverse of a square matrix $[A]$ , denoted as $[A]^{-1}$ , is a matrix such that $[A][A]^{-1} = [I]$ , where $[I]$ is the identity matrix. If the inverse is known, the solution to the system is simply $\{x\} = [A]^{-1}\{B\}$ .

Gauss-Jordan for Inversion

Gauss-Jordan elimination can be used to determine the inverse of a matrix by augmenting the original matrix with the identity matrix and performing the elimination steps.

LU Decomposition

LU decomposition separates the time-consuming elimination of the matrix $[A]$ from the manipulations of the right-hand side $\{B\}$ . This is enormously advantageous when solving the same system for many different load vectors $\{B\}$ because the expensive $O(n^3)$ elimination step only happens once.

LU Decomposition Process

The matrix $[A]$ is decomposed into a lower triangular matrix $[L]$ and an upper triangular matrix $[U]$ .

The system $[A]\{x\} = \{B\}$ becomes $[L][U]\{x\} = \{B\}$ . We can then solve for $\{d\}$ in $[L]\{d\} = \{B\}$ using forward substitution, and then solve for $\{x\}$ in $[U]\{x\} = \{d\}$ using back substitution.

There are specific algorithms for computing the LU factors:

Doolittle's Method: Specifies that all diagonal elements of the lower triangular matrix $[L]$ must be $1$ (i.e., $l_{ii} = 1$ ).
Crout's Method: Specifies that all diagonal elements of the upper triangular matrix $[U]$ must be $1$ (i.e., $u_{ii} = 1$ ).

LU Decomposition Equation

Represents a matrix as the product of lower and upper triangular matrices.

[A] = [L][U]

Variables

Symbol	Description	Unit
$[A]$	Original coefficient matrix	-
$[L]$	Lower triangular matrix	-
$[U]$	Upper triangular matrix	-

Cholesky Decomposition

If the matrix $[A]$ is symmetric and positive definite, it can be decomposed into a lower triangular matrix and its transpose. This method is highly efficient, requiring roughly $\frac{n^3}{6}$ operations, about half the execution time and storage of standard LU decomposition.

Cholesky Decomposition Equation

Decomposes a symmetric positive-definite matrix into a lower triangular matrix and its transpose.

[A] = [L][L]^T

Variables

Symbol	Description	Unit
$[A]$	Symmetric, positive-definite coefficient matrix	-
$[L]$	Lower triangular matrix	-
$[L]^T$	Transpose of the lower triangular matrix	-

Singular Value Decomposition (SVD)

SVD is one of the most powerful matrix factorization techniques, particularly useful for non-square or ill-conditioned matrices where LU or Cholesky decomposition fails.

Singular Value Decomposition Concepts

Any $m \times n$ matrix $[A]$ can be factored into three matrices.

SVD can easily compute the pseudoinverse of a matrix to solve least-squares problems and explicitly determine the rank, null space, and condition number of $[A]$ (where $\text{Cond}(A) = \sigma_{\text{max}}/\sigma_{\text{min}}$ ).

SVD Formula

Factors an $m \times n$ matrix into singular vectors and singular values.

[A] = [U][\Sigma][V]^T

Variables

Symbol	Description	Unit
$[A]$	Original $m \times n$ matrix	-
$[U]$	$m \times m$ orthogonal matrix of left singular vectors	-
$[\Sigma]$	$m \times n$ diagonal matrix of singular values	-
$[V]^T$	Transpose of an $n \times n$ orthogonal matrix of right singular vectors	-

Thomas Algorithm for Tridiagonal Systems

Many engineering problems, particularly those involving 1D partial differential equations (like the heat equation) or cubic splines, result in tridiagonal systems of linear equations. A tridiagonal matrix has non-zero elements only on the main diagonal, the first diagonal below this, and the first diagonal above this.

Thomas Algorithm

The Thomas algorithm is a highly efficient, simplified form of Gaussian elimination specifically designed for tridiagonal systems. It operates in two passes:

Forward Elimination: Eliminates the lower diagonal elements.
Backward Substitution: Solves for the unknowns starting from the last equation.

Because it avoids operations on known zero elements, the computational cost is strictly proportional to the number of equations ( $O(n)$ ) rather than $O(n^3)$ for standard Gaussian elimination. This allows solving millions of equations in milliseconds.

Vector Norms

To quantify the "size" of a matrix, we first must define the "length" or "magnitude" of a vector in higher dimensions using a vector norm.

Vector Norm

A function that assigns a strictly positive length or size to each vector in a vector space, measuring its magnitude.

Common Vector Norms

A vector norm $||v||$ must satisfy basic scaling and triangle inequality rules. Common examples include:

Euclidean Norm ( $L_2$ norm): The standard physical length of a vector, defined as $\sqrt{\sum v_i^2}$ .
Sum Norm ( $L_1$ norm): The sum of the absolute values of the elements, $\sum |v_i|$ .
Max Norm ( $L_\infty$ norm): The single largest absolute value element in the vector, $\max |v_i|$ .

Matrix Norms

Building on vector norms, matrix norms quantify how much a matrix $[A]$ can stretch or amplify a vector. A matrix norm $||A||$ is a real number that provides a measure of the "size" or magnitude of the matrix.

Matrix Norm

A quantity representing the "size" or stretching power of a matrix, extending the concept of vector norms to matrices.

Common Matrix Norms

Row-Sum Norm ( $||A||_{\infty}$ ): The maximum of the sums of the absolute values of the elements in each row.
Column-Sum Norm ( $||A||_1$ ): The maximum of the sums of the absolute values of the elements in each column.
Spectral Norm ( $||A||_2$ ): The square root of the largest eigenvalue of $A^TA$ .

The formal Condition Number is defined as $\text{Cond}(A) = ||A|| \cdot ||A^{-1}||$ . If $\text{Cond}(A) \approx 1$ , the matrix is well-conditioned. If $\text{Cond}(A) \gg 1$ , it is ill-conditioned.

Ill-Conditioning

Condition Number

A measure of how sensitive the solution of a linear system is to small changes or errors in the input data. A large condition number indicates an ill-conditioned system.

A system is ill-conditioned if small changes in the coefficients (e.g., due to round-off errors) result in large changes in the solution. This implies the matrix is close to being singular (determinant near zero). The condition number quantifies this: a large condition number indicates an ill-conditioned system. The condition number is formally connected to matrix norms as $\text{Cond}(A) = ||A|| \cdot ||A^{-1}||$ , and in SVD terms, as the ratio of the largest to smallest singular value. Techniques like pivoting and using higher precision arithmetic are required to mitigate these effects.

Sparse Matrices

Sparse Matrix

A matrix in which most of the elements are zero. In contrast, if most of the elements are non-zero, then the matrix is considered dense.

A matrix is sparse if a vast majority of its elements are zero. Typical examples arise in the finite difference and finite element methods (like tridiagonal and banded matrices).

Sparse Matrix Techniques

Instead of storing $n^2$ elements, sparse matrix representations (like Compressed Sparse Row, CSR) store only the non-zero elements and their indices. This drastically reduces memory usage from $O(n^2)$ to $O(k)$ where $k$ is the number of non-zeros. Iterative methods (like Conjugate Gradient or Gauss-Seidel) are particularly well-suited for sparse systems because they only require matrix-vector multiplication, preserving the sparsity structure.

Iterative Refinement

Iterative refinement is a technique used to improve the accuracy of solutions obtained by direct methods (like LU decomposition) when dealing with ill-conditioned systems, essentially using double precision to polish a single-precision result.

Iterative Refinement Steps

STEP-BY-STEP

Calculate an initial approximate solution $\{ \tilde{x} \}$ using a direct method.
Compute the residual vector $\{R\} = \{B\} - [A]\{\tilde{x}\}$ using higher precision arithmetic.
Solve the correction system $[A]\{\Delta x\} = \{R\}$ using the already computed LU factors.
Update the solution: $\{x_{\text{new}}\} = \{\tilde{x}\} + \{\Delta x\}$ .

This process can be repeated until the correction $\{\Delta x\}$ becomes negligible.

Iterative Methods

Iterative Method

A mathematical procedure that uses an initial guess to generate a sequence of improving approximate solutions for a class of problems, in which the $n$ -th approximation is derived from the previous ones.

Unlike direct methods, iterative methods start with a guess and continually refine the estimate. They are particularly useful for large systems containing many zero elements (sparse matrices).

Jacobi and Gauss-Seidel Methods

Jacobi Method: Computes all new values $x_i^{k+1}$ using only the old values $x^k$ from the previous iteration.
Gauss-Seidel Method: Uses the most recently computed values $x_j^{k+1}$ as soon as they are available, generally converging faster than the Jacobi method.

Gauss-Seidel Iteration Formula

Computes the new value $x_i^{k+1}$ using the most recently computed values.

x_i^{k+1} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij}x_j^{k+1} - \sum_{j=i+1}^n a_{ij}x_j^k}{a_{ii}}

Variables

Symbol	Description	Unit
$x_i^{k+1}$	New estimate of the $i$ -th unknown at iteration $k+1$	-
$x_j^{k+1}$	Recently computed estimates in the current iteration	-
$x_j^k$	Old estimates from the previous iteration	-
$a_{ij}$	Coefficient of the system matrix	-
$b_i$	Constant term of the $i$ -th equation	-

Convergence Requirements

For both methods to converge reliably, the system must typically be diagonally dominant (the absolute value of the diagonal element is greater than the sum of the absolute values of the other elements in the row). Furthermore, the rigorous mathematical condition for the convergence of any linear stationary iterative method (like Jacobi or Gauss-Seidel) is that the spectral radius $\rho$ (the maximum absolute eigenvalue) of the iteration matrix must be strictly less than $1$ ( $\rho < 1$ ).

Successive Over-Relaxation (SOR)

To accelerate the convergence of the Gauss-Seidel method, relaxation techniques apply a weighting factor $\omega$ . If $1 < \omega < 2$ , the method is called over-relaxation and accelerates convergence.

Successive Over-Relaxation Equation

Applies a relaxation factor to accelerate iterative convergence.

x_i^{\text{new}} = \omega x_i^{\text{calculated}} + (1 - \omega)x_i^{\text{old}}

Variables

Symbol	Description	Unit
$x_i^{\text{new}}$	Accelerated estimate of the unknown	-
$x_i^{\text{calculated}}$	Estimate obtained from the standard iterative method	-
$x_i^{\text{old}}$	Previous estimate	-
$\omega$	Relaxation weighting factor ( $1 < \omega < 2$ for over-relaxation)	-

Key Takeaways

Cramer's Rule provides an exact analytical solution but is computationally inefficient for large systems ( $O(n!)$ operations).
Gauss elimination and LU decomposition are direct methods that require $O(n^3/3)$ operations. They yield exact solutions barring round-off errors.
Partial Pivoting swaps rows to place the largest element on the diagonal, which is essential in direct methods to prevent division by zero and minimize round-off errors.
LU decomposition is highly efficient for multiple right-hand sides since the elimination is only performed once. Cholesky decomposition optimizes this for symmetric positive-definite matrices ( $O(n^3/6)$ ).
Singular Value Decomposition (SVD) provides a robust factorization for ill-conditioned or non-square matrices and directly yields the pseudoinverse.
Thomas Algorithm solves tridiagonal systems efficiently in $O(n)$ time.
Sparse Matrices store only non-zero elements, dramatically reducing memory usage and making iterative solvers highly attractive.
Ill-conditioned systems are highly sensitive to round-off errors and have large condition numbers, often quantified using Vector and Matrix Norms (L1, L2, L-infinity).
Iterative Refinement can polish solutions from direct methods by using higher precision on the residual.
Jacobi and Gauss-Seidel are iterative methods effective for large, sparse, diagonally dominant systems. Convergence requires the spectral radius of the iteration matrix to be less than 1.
Successive Over-Relaxation (SOR) accelerates the convergence of iterative methods.

Previous TopicRoots of Equations - Examples & Applications

Quiz Me

Next TopicSystems of Linear Equations - Examples & Applications

Prev Next

Quiz Me