Graph Embeddings

Learning low-dimensional representations through random walks and skip-gram models

Strategy:

Walk Length:

Node2Vec Parameters: p=1 (return), q=1 (in-out). Low p encourages revisiting, low q encourages exploration (DFS), high q encourages local search (BFS).

What are Graph Embeddings?

Graph embeddings transform nodes, edges, or entire graphs into dense vector representations that preserve structural and semantic properties. These embeddings enable machine learning on graph data by converting discrete graph structures into continuous vector spaces.

Random Walk Strategies

DeepWalk

Uniform Random Walks: Equal probability to all neighbors
Captures Structural Equivalence: Nodes with similar local structures get similar embeddings
Simple and Scalable: Works well for large graphs

Node2Vec

Biased Random Walks: Controlled by parameters p and q
Return Parameter (p): Controls likelihood of returning to previous node
- Low p: Encourages revisiting, local exploration
- High p: Discourages backtracking
In-Out Parameter (q): Controls exploration vs exploitation
- Low q: DFS-like behavior, explores graph globally
- High q: BFS-like behavior, stays local

Skip-Gram Training

The skip-gram model learns embeddings by predicting context nodes given a center node:

Generate Random Walks: Sample sequences of nodes
Create Training Pairs: (center node, context node) within window
Optimize Embedding: Maximize probability of observing context given center
Loss Function: Negative log-likelihood with negative sampling

Key Properties Preserved

Homophily

Nodes that are close in the graph should have similar embeddings.

Structural Equivalence

Nodes with similar structural roles should have similar embeddings.

Community Structure

Nodes in the same community cluster together in embedding space.

Applications

Node Classification: Predict node labels using embeddings as features
Link Prediction: Compute similarity between node embeddings
Community Detection: Cluster nodes in embedding space
Graph Visualization: Use 2D/3D projections of embeddings
Recommendation Systems: Find similar items via embedding similarity

Comparison of Methods

Method	Walk Strategy	Training	Pros	Cons
DeepWalk	Uniform	Skip-gram	Simple, scalable	No edge weights
Node2Vec	Biased (p,q)	Skip-gram	Flexible	More parameters
LINE	1st/2nd order	Direct	Preserves proximity	Memory intensive
GraphSAGE	Sampling	Aggregation	Inductive	Complex

Best Practices

Walk Parameters: Start with walk_length=80, num_walks=10
Window Size: Use window_size=10 for skip-gram
Embedding Dimension: Typically 64-256 dimensions
Node2Vec Tuning:
- Grid search p, q ∈ 4
- p=1, q=1 gives DeepWalk behavior
Evaluation: Use downstream task performance

Table of Contents

Graph Embeddings