Vectors & Matrices

Best viewed on desktop for optimal interactive experience

Vectors & Matrices in ML

Vectors and matrices are the building blocks of machine learning. Every data point, weight, and activation in your model is represented using these structures.

Vector & Matrix Operations

Vector A

Vector B

Operation:A + B

Result:—

Vectors

A vector is an ordered list of numbers. In ML, vectors represent:

Features: One data point (e.g., pixel values, word embeddings)
Weights: Parameters in a linear layer
Gradients: Derivatives for optimization

Vector Operations

import numpy as np

# Creating vectors
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

# Addition
z = x + y  # [5, 7, 9]

# Scalar multiplication
w = 2 * x  # [2, 4, 6]

# Dot product
dot = np.dot(x, y)  # 32

# Norm (magnitude)
norm = np.linalg.norm(x)  # 3.74

Matrices

A matrix is a 2D array of numbers. In ML, matrices represent:

Datasets: Rows are samples, columns are features
Weight matrices: Transform inputs in neural networks
Attention scores: Relationships between tokens

Matrix Operations

# Creating matrices
A = np.array([[1, 2], 
              [3, 4]])
B = np.array([[5, 6], 
              [7, 8]])

# Matrix multiplication
C = A @ B  # [[19, 22], [43, 50]]

# Element-wise multiplication
D = A * B  # [[5, 12], [21, 32]]

# Transpose
A_T = A.T  # [[1, 3], [2, 4]]

# Inverse (if exists)
A_inv = np.linalg.inv(A)

Tensors

Higher-dimensional generalizations:

3D Tensor: Batch of matrices (e.g., batch of images)
4D Tensor: Batch of volumes (e.g., batch of videos)

# Common tensor shapes in ML
batch_size = 32
seq_length = 100
hidden_dim = 768

# NLP: [batch, sequence, features]
text_tensor = np.zeros((batch_size, seq_length, hidden_dim))

# Vision: [batch, channels, height, width]
image_tensor = np.zeros((batch_size, 3, 224, 224))

Broadcasting

NumPy/PyTorch automatically expand dimensions:

# Matrix (2, 3) + Vector (3,) → Matrix (2, 3)
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
vector = np.array([10, 20, 30])

result = matrix + vector
# [[11, 22, 33],
#  [14, 25, 36]]

Matrix Shapes in Neural Networks

Linear Layer

# Input: (batch_size, input_dim)
# Weight: (input_dim, output_dim)
# Bias: (output_dim,)
# Output: (batch_size, output_dim)

def linear_forward(X, W, b):
    return X @ W + b

Attention Mechanism

# Query, Key, Value: (batch, seq_len, dim)
# Scores: (batch, seq_len, seq_len)

def attention(Q, K, V):
    scores = Q @ K.T / sqrt(dim)
    weights = softmax(scores)
    return weights @ V

Why This Matters

Efficient Computation: Matrix operations are highly optimized on GPUs
Vectorization: Process entire batches at once
Linear Transformations: Foundation of neural network layers
Geometric Intuition: Vectors as points/directions in space

Common Pitfalls

Shape Mismatch: Always check dimensions before operations
Row vs Column: Be consistent with vector orientation
Memory Layout: Row-major (C) vs column-major (Fortran)
Numerical Stability: Watch for overflow/underflow