Vectors & Matrices
Understand vectors and matrices - the fundamental data structures in machine learning.
Best viewed on desktop for optimal interactive experience
Vectors & Matrices in ML
Vectors and matrices are the building blocks of machine learning. Every data point, weight, and activation in your model is represented using these structures.
Vector A
+
Vector B
=
Result
Vectors
A vector is an ordered list of numbers. In ML, vectors represent:
- Features: One data point (e.g., pixel values, word embeddings)
- Weights: Parameters in a linear layer
- Gradients: Derivatives for optimization
Vector Operations
import numpy as np # Creating vectors x = np.array([1, 2, 3]) y = np.array([4, 5, 6]) # Addition z = x + y # [5, 7, 9] # Scalar multiplication w = 2 * x # [2, 4, 6] # Dot product dot = np.dot(x, y) # 32 # Norm (magnitude) norm = np.linalg.norm(x) # 3.74
Matrices
A matrix is a 2D array of numbers. In ML, matrices represent:
- Datasets: Rows are samples, columns are features
- Weight matrices: Transform inputs in neural networks
- Attention scores: Relationships between tokens
Matrix Operations
# Creating matrices A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Matrix multiplication C = A @ B # [[19, 22], [43, 50]] # Element-wise multiplication D = A * B # [[5, 12], [21, 32]] # Transpose A_T = A.T # [[1, 3], [2, 4]] # Inverse (if exists) A_inv = np.linalg.inv(A)
Tensors
Higher-dimensional generalizations:
- 3D Tensor: Batch of matrices (e.g., batch of images)
- 4D Tensor: Batch of volumes (e.g., batch of videos)
# Common tensor shapes in ML batch_size = 32 seq_length = 100 hidden_dim = 768 # NLP: [batch, sequence, features] text_tensor = np.zeros((batch_size, seq_length, hidden_dim)) # Vision: [batch, channels, height, width] image_tensor = np.zeros((batch_size, 3, 224, 224))
Broadcasting
NumPy/PyTorch automatically expand dimensions:
# Matrix (2, 3) + Vector (3,) → Matrix (2, 3) matrix = np.array([[1, 2, 3], [4, 5, 6]]) vector = np.array([10, 20, 30]) result = matrix + vector # [[11, 22, 33], # [14, 25, 36]]
Matrix Shapes in Neural Networks
Linear Layer
# Input: (batch_size, input_dim) # Weight: (input_dim, output_dim) # Bias: (output_dim,) # Output: (batch_size, output_dim) def linear_forward(X, W, b): return X @ W + b
Attention Mechanism
# Query, Key, Value: (batch, seq_len, dim) # Scores: (batch, seq_len, seq_len) def attention(Q, K, V): scores = Q @ K.T / sqrt(dim) weights = softmax(scores) return weights @ V
Why This Matters
- Efficient Computation: Matrix operations are highly optimized on GPUs
- Vectorization: Process entire batches at once
- Linear Transformations: Foundation of neural network layers
- Geometric Intuition: Vectors as points/directions in space
Common Pitfalls
- Shape Mismatch: Always check dimensions before operations
- Row vs Column: Be consistent with vector orientation
- Memory Layout: Row-major (C) vs column-major (Fortran)
- Numerical Stability: Watch for overflow/underflow
Next Steps
- Explore Matrix Operations
- Learn about Eigenvalues & Eigenvectors
- Understand Gradients & Derivatives