Abhik Sarkar | ML & CV Consultant

Graph Convolutional Networks (GCN)

Neural networks for graph-structured data using spectral convolutions

Layer:

Aggregation:

Click on nodes to see neighborhoods

Architecture: Each GCN layer aggregates features from neighboring nodes and transforms them through learned weights. The graph structure guides information flow.

What are Graph Convolutional Networks?

Graph Convolutional Networks (GCNs) are neural networks designed to work directly on graph-structured data. They generalize the concept of convolution from regular grids (like images) to irregular graph structures, enabling deep learning on social networks, molecular structures, knowledge graphs, and more.

Core Concepts

1. Graph Structure

A graph G = (V, E) consists of:

Vertices (V): Nodes representing entities
Edges (E): Connections between nodes
Features (X): Node attributes/features
Adjacency Matrix (A): Encodes graph structure

# Graph representation
G = {
    'nodes': [0, 1, 2, 3, 4],
    'edges': [(0,1), (0,2), (1,2), (2,3), (3,4)],
    'features': torch.randn(5, 16),  # 5 nodes, 16 features each
    'adjacency': sparse_adjacency_matrix
}

2. Spectral Convolution

GCNs perform convolution in the spectral domain using the graph Laplacian:

Graph Laplacian: L = D - A
Normalized Laplacian: L_norm = I - D^(-1/2) A D^(-1/2)

Where:
- D: Degree matrix (diagonal)
- A: Adjacency matrix
- I: Identity matrix

3. Layer-wise Propagation Rule

The key GCN equation:

H^(l+1) = σ(D̃^(-1/2) Ã D̃^(-1/2) H^(l) W^(l))

Where:
- Ã = A + I (adjacency with self-loops)
- D̃: Degree matrix of Ã
- H^(l): Node features at layer l
- W^(l): Trainable weight matrix
- σ: Activation function

Message Passing Framework

Neighborhood Aggregation

GCNs follow a message-passing paradigm:

Message: Information from neighbor nodes
Aggregate: Combine neighbor messages
Update: Transform aggregated information

def gcn_layer(features, adj_matrix, weight):
    # Message: neighbor features
    messages = adj_matrix @ features
    
    # Aggregate: normalized sum
    degree = adj_matrix.sum(dim=1, keepdim=True)
    aggregated = messages / degree
    
    # Update: linear transformation + activation
    output = relu(aggregated @ weight)
    return output

Aggregation Functions

Different aggregation strategies:

# Mean aggregation (GCN)
h_i = mean([h_j for j in neighbors(i)])

# Sum aggregation (GraphSAGE)
h_i = sum([h_j for j in neighbors(i)])

# Max aggregation
h_i = max([h_j for j in neighbors(i)])

# Attention aggregation (GAT)
h_i = sum([α_ij * h_j for j in neighbors(i)])

Spectral Graph Theory

Graph Fourier Transform

The graph Fourier transform uses eigenvectors of the Laplacian:

Graph Fourier Transform: x̂ = U^T x
Inverse Transform: x = U x̂

Where U contains eigenvectors of L

Spectral Filtering

Convolution as multiplication in spectral domain:

g * x = U((U^T g) ⊙ (U^T x))

Simplified (ChebNet): g * x ≈ Σ_k θ_k T_k(L̃) x

Eigenvalue Properties

λ = 0: Constant eigenvector (connected components)
Small λ: Low-frequency, smooth signals
Large λ: High-frequency, varying signals

Architecture Design

Typical GCN Architecture

class GCN(nn.Module):
    def __init__(self, input_dim, hidden_dims, output_dim, dropout=0.5):
        super().__init__()
        self.layers = nn.ModuleList()
        
        # Build layers
        dims = [input_dim] + hidden_dims + [output_dim]
        for i in range(len(dims) - 1):
            self.layers.append(GCNLayer(dims[i], dims[i+1]))
        
        self.dropout = dropout
    
    def forward(self, x, adj):
        for i, layer in enumerate(self.layers[:-1]):
            x = F.relu(layer(x, adj))
            x = F.dropout(x, self.dropout, training=self.training)
        
        # Last layer without activation
        x = self.layers[-1](x, adj)
        return F.log_softmax(x, dim=1)

Depth Considerations

Receptive field after k layers: k-hop neighborhood

Issues with deep GCNs:
- Over-smoothing: Features become indistinguishable
- Gradient vanishing: Training becomes difficult
- Computational cost: Exponential neighborhood growth

Solutions:
- Residual connections
- Jumping knowledge networks
- DropEdge regularization

Training Strategies

Semi-supervised Learning

# Typical train/val/test split for nodes
train_mask = torch.zeros(num_nodes, dtype=torch.bool)
train_mask[:num_train] = True

val_mask = torch.zeros(num_nodes, dtype=torch.bool)
val_mask[num_train:num_train+num_val] = True

test_mask = torch.zeros(num_nodes, dtype=torch.bool)
test_mask[num_train+num_val:] = True

# Training with masks
output = model(features, adj)
loss = F.nll_loss(output[train_mask], labels[train_mask])

Regularization Techniques

# Dropout
x = F.dropout(x, p=0.5, training=self.training)

# DropEdge: randomly remove edges
def drop_edge(adj, drop_rate=0.5):
    mask = torch.rand_like(adj) > drop_rate
    return adj * mask

# L2 regularization
l2_reg = sum(p.pow(2).sum() for p in model.parameters())
loss = loss + weight_decay * l2_reg

Variants and Extensions

GraphSAGE

Samples and aggregates from local neighborhoods:

h_i = σ(W · [h_i || AGG({h_j : j ∈ N(i)})])

Graph Attention Networks (GAT)

Uses attention mechanism for aggregation:

α_ij = softmax(LeakyReLU(a^T [W h_i || W h_j]))
h_i' = σ(Σ_j α_ij W h_j)

Graph Isomorphism Network (GIN)

Maximally expressive under WL-test:

h_i = MLP((1 + ε) · h_i + Σ_j h_j)

Applications

1. Node Classification

Social network analysis
Protein function prediction
Document categorization in citation networks

2. Graph Classification

Molecular property prediction
Scene graph classification
Program analysis

3. Link Prediction

Recommendation systems
Knowledge graph completion
Drug-drug interaction prediction

4. Graph Generation

Molecule design
Network synthesis
Scene generation

Implementation Best Practices

Data Preprocessing

# Feature normalization
features = (features - features.mean(0)) / features.std(0)

# Adjacency normalization
def normalize_adj(adj):
    # Add self-loops
    adj = adj + torch.eye(adj.size(0))
    
    # Symmetric normalization
    degree = adj.sum(1)
    d_inv_sqrt = degree.pow(-0.5)
    d_inv_sqrt[torch.isinf(d_inv_sqrt)] = 0
    d_mat = torch.diag(d_inv_sqrt)
    
    return d_mat @ adj @ d_mat

Efficient Implementation

# Use sparse matrices for large graphs
adj = torch.sparse_coo_tensor(edge_index, edge_weight, (n, n))

# Batch processing for multiple graphs
from torch_geometric.data import DataLoader
loader = DataLoader(dataset, batch_size=32, shuffle=True)

# Mini-batch training with neighbor sampling
from torch_geometric.loader import NeighborLoader
loader = NeighborLoader(
    data,
    num_neighbors=[25, 10],  # Sample sizes for 2 layers
    batch_size=128,
    shuffle=True
)

Hyperparameter Tuning

# Common hyperparameters
config = {
    'hidden_dims': [16, 16],    # Hidden layer sizes
    'dropout': 0.5,              # Dropout rate
    'lr': 0.01,                  # Learning rate
    'weight_decay': 5e-4,        # L2 regularization
    'epochs': 200,               # Training epochs
    'early_stopping': 10         # Patience for early stopping
}

Performance Considerations

Computational Complexity

Time complexity per layer: O(|E| · F)
Space complexity: O(|V| · F)

Where:
- |E|: Number of edges
- |V|: Number of vertices
- F: Feature dimension

Scalability Solutions

Sampling-based methods: GraphSAINT, FastGCN
Cluster-based methods: ClusterGCN
Simplified models: SGC, SIGN
Hardware acceleration: GPU, TPU optimization

Common Challenges

Over-smoothing

# Problem: Features converge to same value
# Solution: Add residual connections
class ResGCN(nn.Module):
    def forward(self, x, adj):
        identity = x
        out = self.gcn(x, adj)
        return out + identity

Heterophily

# Problem: Connected nodes have different labels
# Solution: Use signed message passing or higher-order neighborhoods
class SignedGCN(nn.Module):
    def forward(self, x, adj_pos, adj_neg):
        h_pos = self.conv_pos(x, adj_pos)
        h_neg = -self.conv_neg(x, adj_neg)
        return h_pos + h_neg

Evaluation Metrics

Node Classification

Accuracy, F1-score, AUC-ROC
Per-class metrics for imbalanced datasets

Graph Classification

Accuracy, precision, recall
Matthews correlation coefficient

Link Prediction

AUC, Average Precision
Hits@K, Mean Reciprocal Rank

Future Directions

Dynamic Graphs: Temporal graph networks
Heterogeneous Graphs: Multi-relational GCNs
Explainability: GNNExplainer, PGExplainer
Robustness: Adversarial training, certified defenses
Pre-training: Self-supervised graph learning

Conclusion

Graph Convolutional Networks bridge the gap between deep learning and graph-structured data, enabling powerful representations for complex relational systems. Understanding both the spectral theory and message-passing perspectives provides insight into their effectiveness and guides architectural choices for specific applications.