Vectors & Matrices

6 min

Understand vectors and matrices - the fundamental data structures in machine learning.

Best viewed on desktop for optimal interactive experience

Vectors & Matrices in ML

Vectors and matrices are the building blocks of machine learning. Every data point, weight, and activation in your model is represented using these structures.

Vector A
+
Vector B
=
Result
[2, 3][1, 2]

Vectors

A vector is an ordered list of numbers. In ML, vectors represent:

  • Features: One data point (e.g., pixel values, word embeddings)
  • Weights: Parameters in a linear layer
  • Gradients: Derivatives for optimization

Vector Operations

import numpy as np # Creating vectors x = np.array([1, 2, 3]) y = np.array([4, 5, 6]) # Addition z = x + y # [5, 7, 9] # Scalar multiplication w = 2 * x # [2, 4, 6] # Dot product dot = np.dot(x, y) # 32 # Norm (magnitude) norm = np.linalg.norm(x) # 3.74

Matrices

A matrix is a 2D array of numbers. In ML, matrices represent:

  • Datasets: Rows are samples, columns are features
  • Weight matrices: Transform inputs in neural networks
  • Attention scores: Relationships between tokens

Matrix Operations

# Creating matrices A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Matrix multiplication C = A @ B # [[19, 22], [43, 50]] # Element-wise multiplication D = A * B # [[5, 12], [21, 32]] # Transpose A_T = A.T # [[1, 3], [2, 4]] # Inverse (if exists) A_inv = np.linalg.inv(A)

Tensors

Higher-dimensional generalizations:

  • 3D Tensor: Batch of matrices (e.g., batch of images)
  • 4D Tensor: Batch of volumes (e.g., batch of videos)
# Common tensor shapes in ML batch_size = 32 seq_length = 100 hidden_dim = 768 # NLP: [batch, sequence, features] text_tensor = np.zeros((batch_size, seq_length, hidden_dim)) # Vision: [batch, channels, height, width] image_tensor = np.zeros((batch_size, 3, 224, 224))

Broadcasting

NumPy/PyTorch automatically expand dimensions:

# Matrix (2, 3) + Vector (3,) → Matrix (2, 3) matrix = np.array([[1, 2, 3], [4, 5, 6]]) vector = np.array([10, 20, 30]) result = matrix + vector # [[11, 22, 33], # [14, 25, 36]]

Matrix Shapes in Neural Networks

Linear Layer

# Input: (batch_size, input_dim) # Weight: (input_dim, output_dim) # Bias: (output_dim,) # Output: (batch_size, output_dim) def linear_forward(X, W, b): return X @ W + b

Attention Mechanism

# Query, Key, Value: (batch, seq_len, dim) # Scores: (batch, seq_len, seq_len) def attention(Q, K, V): scores = Q @ K.T / sqrt(dim) weights = softmax(scores) return weights @ V

Why This Matters

  1. Efficient Computation: Matrix operations are highly optimized on GPUs
  2. Vectorization: Process entire batches at once
  3. Linear Transformations: Foundation of neural network layers
  4. Geometric Intuition: Vectors as points/directions in space

Common Pitfalls

  • Shape Mismatch: Always check dimensions before operations
  • Row vs Column: Be consistent with vector orientation
  • Memory Layout: Row-major (C) vs column-major (Fortran)
  • Numerical Stability: Watch for overflow/underflow

Next Steps

If you found this explanation helpful, consider sharing it with others.

Mastodon