Graph Embeddings
Learning low-dimensional vector representations of graphs through random walks, DeepWalk, Node2Vec, and skip-gram models
Best viewed on desktop for optimal interactive experience
Graph Embeddings
Learning low-dimensional representations through random walks and skip-gram models
Node2Vec Parameters: p=1 (return), q=1 (in-out). Low p encourages revisiting, low q encourages exploration (DFS), high q encourages local search (BFS).
What are Graph Embeddings?
Graph embeddings transform nodes, edges, or entire graphs into dense vector representations that preserve structural and semantic properties. These embeddings enable machine learning on graph data by converting discrete graph structures into continuous vector spaces.
Random Walk Strategies
DeepWalk
- Uniform Random Walks: Equal probability to all neighbors
- Captures Structural Equivalence: Nodes with similar local structures get similar embeddings
- Simple and Scalable: Works well for large graphs
Node2Vec
- Biased Random Walks: Controlled by parameters p and q
- Return Parameter (p): Controls likelihood of returning to previous node
- Low p: Encourages revisiting, local exploration
- High p: Discourages backtracking
- In-Out Parameter (q): Controls exploration vs exploitation
- Low q: DFS-like behavior, explores graph globally
- High q: BFS-like behavior, stays local
Skip-Gram Training
The skip-gram model learns embeddings by predicting context nodes given a center node:
- Generate Random Walks: Sample sequences of nodes
- Create Training Pairs: (center node, context node) within window
- Optimize Embedding: Maximize probability of observing context given center
- Loss Function: Negative log-likelihood with negative sampling
Key Properties Preserved
Homophily
Nodes that are close in the graph should have similar embeddings.
Structural Equivalence
Nodes with similar structural roles should have similar embeddings.
Community Structure
Nodes in the same community cluster together in embedding space.
Applications
- Node Classification: Predict node labels using embeddings as features
- Link Prediction: Compute similarity between node embeddings
- Community Detection: Cluster nodes in embedding space
- Graph Visualization: Use 2D/3D projections of embeddings
- Recommendation Systems: Find similar items via embedding similarity
Comparison of Methods
Method | Walk Strategy | Training | Pros | Cons |
---|---|---|---|---|
DeepWalk | Uniform | Skip-gram | Simple, scalable | No edge weights |
Node2Vec | Biased (p,q) | Skip-gram | Flexible | More parameters |
LINE | 1st/2nd order | Direct | Preserves proximity | Memory intensive |
GraphSAGE | Sampling | Aggregation | Inductive | Complex |
Best Practices
- Walk Parameters: Start with walk_length=80, num_walks=10
- Window Size: Use window_size=10 for skip-gram
- Embedding Dimension: Typically 64-256 dimensions
- Node2Vec Tuning:
- Grid search p, q ∈ 4
- p=1, q=1 gives DeepWalk behavior
- Evaluation: Use downstream task performance