Linear Algebra Fundamentals

10 min

Essential linear algebra concepts for machine learning with interactive visualizations

Best viewed on desktop for optimal interactive experience

What is Linear Algebra?

Linear algebra is the branch of mathematics concerning linear equations, linear functions, and their representations through matrices and vector spaces. It's fundamental to machine learning because:

  • Data representation: Features are vectors, datasets are matrices
  • Transformations: Neural networks perform linear transformations
  • Optimization: Gradient descent operates in vector spaces
  • Dimensionality reduction: PCA, SVD rely on linear algebra

Interactive Visualization

Interactive Linear Algebra

Dot Product
9.00
Angle
37.9°
|v₁|
3.61
|v₂|
3.16

Core Concepts

1. Scalars

A scalar is a single number. In ML contexts:

  • Learning rate (α = 0.01)
  • Regularization parameter (λ = 0.1)
  • Individual predictions

2. Vectors

A vector is an ordered array of numbers:

# Column vector (most common in ML) x = [x₁] [x₂] [x₃] # Row vector x = [x₁, x₂, x₃]

Properties:

  • Dimension: Number of elements
  • Magnitude: ||x|| = √(x₁² + x₂² + ... + xₙ²)
  • Direction: Orientation in space

Operations:

  • Addition: Element-wise addition
  • Scalar multiplication: Multiply each element
  • Dot product: x·y = Σ(xᵢ × yᵢ)

3. Matrices

A matrix is a 2D array of numbers:

A = [a₁₁ a₁₂ a₁₃] [a₂₁ a₂₂ a₂₃] [a₃₁ a₃₂ a₃₃]

Properties:

  • Shape: (rows, columns)
  • Rank: Number of linearly independent rows/columns
  • Determinant: Scalar that describes transformation scaling

Operations:

  • Addition: Element-wise (same shape required)
  • Multiplication: AB ≠ BA (non-commutative)
  • Transpose: Flip rows and columns
  • Inverse: A⁻¹ such that AA⁻¹ = I

4. Tensors

Generalization to n-dimensional arrays:

  • Scalar: 0D tensor
  • Vector: 1D tensor
  • Matrix: 2D tensor
  • 3D+ tensor: Used in deep learning (batch × height × width × channels)

Key Operations for ML

Matrix Multiplication

Essential for neural network forward pass:

# Weight matrix × Input vector y = Wx + b # Where: # W: weight matrix (m × n) # x: input vector (n × 1) # b: bias vector (m × 1) # y: output vector (m × 1)

Dot Product

Measures similarity between vectors:

similarity = x · y = ||x|| ||y|| cos(θ) # Applications: # - Cosine similarity # - Attention mechanisms # - Feature matching

Eigendecomposition

For symmetric matrix A:

A = QΛQ^T

Where:

  • Q: Matrix of eigenvectors
  • Λ: Diagonal matrix of eigenvalues

Applications:

  • PCA (Principal Component Analysis)
  • Spectral clustering
  • Network analysis

Linear Transformations

Common Transformations

  1. Scaling: Stretch/shrink along axes
  2. Rotation: Rotate around origin
  3. Reflection: Mirror across line
  4. Shearing: Slant parallel to axis
  5. Projection: Map to lower dimension

Transformation Matrix Examples

# Scaling by factor of 2 S = [[2, 0], [0, 2]] # Rotation by θ R = [[cos(θ), -sin(θ)], [sin(θ), cos(θ)]] # Reflection across x-axis F = [[1, 0], [0, -1]]

Vector Spaces

Basis Vectors

A set of linearly independent vectors that span the space:

# Standard basis in R² e₁ = [1, 0] e₂ = [0, 1] # Any vector can be expressed as: v = a·e₁ + b·e₂

Subspaces

Important subspaces in ML:

  • Column space: Range of possible outputs
  • Null space: Inputs that map to zero
  • Row space: Space of possible weights

Norms and Distances

Common Norms

# L1 norm (Manhattan distance) ||x||= Σ|xᵢ| # L2 norm (Euclidean distance) ||x||=(Σxᵢ²) # L∞ norm (Maximum norm) ||x||= max|xᵢ|

Applications in ML

  • L1 regularization: Promotes sparsity (Lasso)
  • L2 regularization: Prevents large weights (Ridge)
  • Distance metrics: k-NN, clustering

Matrix Decompositions

Singular Value Decomposition (SVD)

A = UΣV^T

Applications:

  • Dimensionality reduction
  • Recommender systems
  • Image compression
  • Natural language processing

LU Decomposition

A = LU

Where L is lower triangular, U is upper triangular.

Applications:

  • Solving linear systems
  • Computing determinants
  • Matrix inversion

QR Decomposition

A = QR

Where Q is orthogonal, R is upper triangular.

Applications:

  • Least squares problems
  • Eigenvalue algorithms
  • Gram-Schmidt process

Applications in Machine Learning

1. Neural Networks

# Forward propagation = W¹x += σ() = W²a¹ +y = σ()

2. Principal Component Analysis (PCA)

  1. Center the data: X - μ
  2. Compute covariance: C = (1/n)X^T X
  3. Find eigenvectors of C
  4. Project: X_reduced = XW_k

3. Gradient Descent

# Parameter update θ = θ - α∇J(θ) # Where: # ∇J(θ) is the gradient (vector of partial derivatives) # α is the learning rate (scalar)

4. Attention Mechanisms

# Scaled dot-product attention Attention(Q, K, V) = softmax(QK^T / √d_k)V

Computational Considerations

Time Complexity

  • Vector addition: O(n)
  • Dot product: O(n)
  • Matrix multiplication: O(n³) naive, O(n^2.376) Strassen
  • Matrix inversion: O(n³)
  • SVD: O(min(m²n, mn²))

Numerical Stability

  • Condition number: Measure of sensitivity to input changes
  • Ill-conditioned matrices: Small changes cause large effects
  • Regularization: Add small values to diagonal for stability

Python Implementation

import numpy as np # Vectors v1 = np.array([1, 2, 3]) v2 = np.array([4, 5, 6]) # Dot product dot = np.dot(v1, v2) # 32 # Matrices A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Matrix multiplication C = A @ B # or np.matmul(A, B) # Eigendecomposition eigenvalues, eigenvectors = np.linalg.eig(A) # SVD U, S, Vt = np.linalg.svd(A) # Solve linear system Ax = b b = np.array([1, 2]) x = np.linalg.solve(A, b)

Common Pitfalls

  1. Broadcasting errors: Shape mismatches in operations
  2. Singular matrices: No inverse exists
  3. Numerical precision: Floating-point errors accumulate
  4. Memory issues: Large matrices exhaust RAM
  5. Non-conformable dimensions: Invalid multiplication

Summary

Linear algebra provides the mathematical foundation for:

  • Data representation and manipulation
  • Model operations (forward/backward pass)
  • Optimization algorithms
  • Dimensionality reduction techniques
  • Understanding model behavior

Master these concepts to build intuition about how machine learning algorithms work at a fundamental level.

If you found this explanation helpful, consider sharing it with others.

Mastodon