Graph Convolutional Networks (GCN)

Neural networks for graph-structured data using spectral convolutions

Click on nodes to see neighborhoods

Architecture: Each GCN layer aggregates features from neighboring nodes and transforms them through learned weights. The graph structure guides information flow.

What are Graph Convolutional Networks?

Graph Convolutional Networks (GCNs) are neural networks designed to work directly on graph-structured data. They generalize the concept of convolution from regular grids (like images) to irregular graph structures, enabling deep learning on social networks, molecular structures, knowledge graphs, and more.

Core Concepts

1. Graph Structure

A graph G = (V, E) consists of:

  • Vertices (V): Nodes representing entities
  • Edges (E): Connections between nodes
  • Features (X): Node attributes/features
  • Adjacency Matrix (A): Encodes graph structure
# Graph representation G = { 'nodes': [0, 1, 2, 3, 4], 'edges': [(0,1), (0,2), (1,2), (2,3), (3,4)], 'features': torch.randn(5, 16), # 5 nodes, 16 features each 'adjacency': sparse_adjacency_matrix }

2. Spectral Convolution

GCNs perform convolution in the spectral domain using the graph Laplacian:

Graph Laplacian: L = D - A Normalized Laplacian: L_norm = I - D^(-1/2) A D^(-1/2) Where: - D: Degree matrix (diagonal) - A: Adjacency matrix - I: Identity matrix

3. Layer-wise Propagation Rule

The key GCN equation:

H^(l+1) = σ(D̃^(-1/2) à D̃^(-1/2) H^(l) W^(l)) Where: - à = A + I (adjacency with self-loops) - D̃: Degree matrix of à - H^(l): Node features at layer l - W^(l): Trainable weight matrix - σ: Activation function

Message Passing Framework

Neighborhood Aggregation

GCNs follow a message-passing paradigm:

  1. Message: Information from neighbor nodes
  2. Aggregate: Combine neighbor messages
  3. Update: Transform aggregated information
def gcn_layer(features, adj_matrix, weight): # Message: neighbor features messages = adj_matrix @ features # Aggregate: normalized sum degree = adj_matrix.sum(dim=1, keepdim=True) aggregated = messages / degree # Update: linear transformation + activation output = relu(aggregated @ weight) return output

Aggregation Functions

Different aggregation strategies:

# Mean aggregation (GCN) h_i = mean([h_j for j in neighbors(i)]) # Sum aggregation (GraphSAGE) h_i = sum([h_j for j in neighbors(i)]) # Max aggregation h_i = max([h_j for j in neighbors(i)]) # Attention aggregation (GAT) h_i = sum([α_ij * h_j for j in neighbors(i)])

Spectral Graph Theory

Graph Fourier Transform

The graph Fourier transform uses eigenvectors of the Laplacian:

Graph Fourier Transform: x̂ = U^T x Inverse Transform: x = U x̂ Where U contains eigenvectors of L

Spectral Filtering

Convolution as multiplication in spectral domain:

g * x = U((U^T g) ⊙ (U^T x)) Simplified (ChebNet): g * x ≈ Σ_k θ_k T_k(L̃) x

Eigenvalue Properties

  • λ = 0: Constant eigenvector (connected components)
  • Small λ: Low-frequency, smooth signals
  • Large λ: High-frequency, varying signals

Architecture Design

Typical GCN Architecture

class GCN(nn.Module): def __init__(self, input_dim, hidden_dims, output_dim, dropout=0.5): super().__init__() self.layers = nn.ModuleList() # Build layers dims = [input_dim] + hidden_dims + [output_dim] for i in range(len(dims) - 1): self.layers.append(GCNLayer(dims[i], dims[i+1])) self.dropout = dropout def forward(self, x, adj): for i, layer in enumerate(self.layers[:-1]): x = F.relu(layer(x, adj)) x = F.dropout(x, self.dropout, training=self.training) # Last layer without activation x = self.layers[-1](x, adj) return F.log_softmax(x, dim=1)

Depth Considerations

Receptive field after k layers: k-hop neighborhood Issues with deep GCNs: - Over-smoothing: Features become indistinguishable - Gradient vanishing: Training becomes difficult - Computational cost: Exponential neighborhood growth Solutions: - Residual connections - Jumping knowledge networks - DropEdge regularization

Training Strategies

Semi-supervised Learning

# Typical train/val/test split for nodes train_mask = torch.zeros(num_nodes, dtype=torch.bool) train_mask[:num_train] = True val_mask = torch.zeros(num_nodes, dtype=torch.bool) val_mask[num_train:num_train+num_val] = True test_mask = torch.zeros(num_nodes, dtype=torch.bool) test_mask[num_train+num_val:] = True # Training with masks output = model(features, adj) loss = F.nll_loss(output[train_mask], labels[train_mask])

Regularization Techniques

# Dropout x = F.dropout(x, p=0.5, training=self.training) # DropEdge: randomly remove edges def drop_edge(adj, drop_rate=0.5): mask = torch.rand_like(adj) > drop_rate return adj * mask # L2 regularization l2_reg = sum(p.pow(2).sum() for p in model.parameters()) loss = loss + weight_decay * l2_reg

Variants and Extensions

GraphSAGE

Samples and aggregates from local neighborhoods:

h_i = σ(W · [h_i || AGG({h_j : j ∈ N(i)})])

Graph Attention Networks (GAT)

Uses attention mechanism for aggregation:

α_ij = softmax(LeakyReLU(a^T [W h_i || W h_j])) h_i' = σ(Σ_j α_ij W h_j)

Graph Isomorphism Network (GIN)

Maximally expressive under WL-test:

h_i = MLP((1 + ε) · h_i + Σ_j h_j)

Applications

1. Node Classification

  • Social network analysis
  • Protein function prediction
  • Document categorization in citation networks

2. Graph Classification

  • Molecular property prediction
  • Scene graph classification
  • Program analysis
  • Recommendation systems
  • Knowledge graph completion
  • Drug-drug interaction prediction

4. Graph Generation

  • Molecule design
  • Network synthesis
  • Scene generation

Implementation Best Practices

Data Preprocessing

# Feature normalization features = (features - features.mean(0)) / features.std(0) # Adjacency normalization def normalize_adj(adj): # Add self-loops adj = adj + torch.eye(adj.size(0)) # Symmetric normalization degree = adj.sum(1) d_inv_sqrt = degree.pow(-0.5) d_inv_sqrt[torch.isinf(d_inv_sqrt)] = 0 d_mat = torch.diag(d_inv_sqrt) return d_mat @ adj @ d_mat

Efficient Implementation

# Use sparse matrices for large graphs adj = torch.sparse_coo_tensor(edge_index, edge_weight, (n, n)) # Batch processing for multiple graphs from torch_geometric.data import DataLoader loader = DataLoader(dataset, batch_size=32, shuffle=True) # Mini-batch training with neighbor sampling from torch_geometric.loader import NeighborLoader loader = NeighborLoader( data, num_neighbors=[25, 10], # Sample sizes for 2 layers batch_size=128, shuffle=True )

Hyperparameter Tuning

# Common hyperparameters config = { 'hidden_dims': [16, 16], # Hidden layer sizes 'dropout': 0.5, # Dropout rate 'lr': 0.01, # Learning rate 'weight_decay': 5e-4, # L2 regularization 'epochs': 200, # Training epochs 'early_stopping': 10 # Patience for early stopping }

Performance Considerations

Computational Complexity

Time complexity per layer: O(|E| · F) Space complexity: O(|V| · F) Where: - |E|: Number of edges - |V|: Number of vertices - F: Feature dimension

Scalability Solutions

  1. Sampling-based methods: GraphSAINT, FastGCN
  2. Cluster-based methods: ClusterGCN
  3. Simplified models: SGC, SIGN
  4. Hardware acceleration: GPU, TPU optimization

Common Challenges

Over-smoothing

# Problem: Features converge to same value # Solution: Add residual connections class ResGCN(nn.Module): def forward(self, x, adj): identity = x out = self.gcn(x, adj) return out + identity

Heterophily

# Problem: Connected nodes have different labels # Solution: Use signed message passing or higher-order neighborhoods class SignedGCN(nn.Module): def forward(self, x, adj_pos, adj_neg): h_pos = self.conv_pos(x, adj_pos) h_neg = -self.conv_neg(x, adj_neg) return h_pos + h_neg

Evaluation Metrics

Node Classification

  • Accuracy, F1-score, AUC-ROC
  • Per-class metrics for imbalanced datasets

Graph Classification

  • Accuracy, precision, recall
  • Matthews correlation coefficient
  • AUC, Average Precision
  • Hits@K, Mean Reciprocal Rank

Future Directions

  1. Dynamic Graphs: Temporal graph networks
  2. Heterogeneous Graphs: Multi-relational GCNs
  3. Explainability: GNNExplainer, PGExplainer
  4. Robustness: Adversarial training, certified defenses
  5. Pre-training: Self-supervised graph learning

Conclusion

Graph Convolutional Networks bridge the gap between deep learning and graph-structured data, enabling powerful representations for complex relational systems. Understanding both the spectral theory and message-passing perspectives provides insight into their effectiveness and guides architectural choices for specific applications.

Further Reading

Mastodon