HNSW: Hierarchical Navigable Small World

HNSW is a state-of-the-art graph-based algorithm for approximate nearest neighbor search that achieves logarithmic search complexity with high recall rates, making it the backbone of many modern vector databases.

Interactive HNSW Explorer

Explore how HNSW builds multi-layer graphs and navigates them efficiently to find nearest neighbors:

HNSW Algorithm Visualization

Hierarchical Navigable Small World - Graph-based vector search

Multi-layer Graph Structure

Layer:

Layer 2 (Sparse)

Layer 1 (Medium)

Layer 0 (Dense)

Layer 2

Nodes:0

Edges:0

Avg Degree:0

Layer 1

Nodes:0

Edges:0

Avg Degree:0

Layer 0

Nodes:0

Edges:0

Avg Degree:0

Why HNSW?

Traditional exact nearest neighbor search has O(N) complexity, making it impractical for large-scale applications. HNSW solves this with:

Logarithmic search time: O(log N)
High recall: 95-99% accuracy
Incremental updates: Add vectors without rebuilding
Robust performance: Works well across different data distributions

Core Concepts

1. Multi-Layer Architecture

HNSW constructs a hierarchy of proximity graphs:

P(layer = l) = min(1, e^{-l · m_L})

Where:

Higher layers are sparser (fewer nodes)
Lower layers are denser (more connections)
Entry points start from the top layer

2. Small World Property

The "small world" phenomenon ensures short paths between any two nodes:

d_avg \propto log N

This is achieved through:

Local connections: Connect to nearby neighbors
Long-range connections: Occasional distant links
Navigable structure: Greedy routing works effectively

3. Construction Algorithm

def insert_node(new_node, M, M_max, ef_construction):
    # Assign layer based on exponential decay
    layer = floor(-ln(uniform(0, 1)) * m_L)
    
    # Find nearest neighbors at each layer
    for lc in range(layer, -1, -1):
        candidates = search_layer(new_node, ef_construction, lc)
        
        # Select M neighbors (M_max for layer 0)
        m = M_max if lc == 0 else M
        neighbors = select_neighbors_heuristic(candidates, m)
        
        # Add bidirectional edges
        for neighbor in neighbors:
            add_edge(new_node, neighbor, lc)
            prune_connections(neighbor, M_max if lc == 0 else M)

4. Search Algorithm

The search proceeds layer by layer, refining candidates:

def search(query, ef, k):
    # Start from entry point at top layer
    current_nearest = entry_point
    
    # Search from top to layer 1
    for layer in range(top_layer, 0, -1):
        current_nearest = search_layer(query, 1, layer)
    
    # Search at layer 0 with ef candidates
    candidates = search_layer(query, ef, 0)
    
    return top_k(candidates, k)

Key Parameters

Construction Parameters

Parameter	Typical Value	Effect
M	16-32	Connections per node
M_max	M×2	Max connections for layer 0
ef_construction	200	Build-time search width
m_L	1/ln(2)	Layer assignment factor

Search Parameters

Parameter	Typical Value	Effect
ef	50-500	Search-time beam width
k	1-100	Number of neighbors to return

Performance Characteristics

Complexity Analysis

Operation	Time Complexity	Space Complexity
Build	O(N log N)	O(M · N)
Search	O(log N)	O(ef)
Insert	O(log N)	O(M)
Delete	Not supported*	-

*Node deletion requires index rebuild

Recall vs Speed Tradeoff

ef value  | Recall | QPS    | Latency
----------|--------|--------|----------
50        | 90%    | 10,000 | 0.1ms
100       | 95%    | 5,000  | 0.2ms
200       | 98%    | 2,500  | 0.4ms
500       | 99.5%  | 1,000  | 1ms

Implementation in Practice

Popular Libraries

Faiss (Facebook)

import faiss

index = faiss.IndexHNSWFlat(d, M)
index.hnsw.efConstruction = 200
index.add(vectors)

HNSWlib

import hnswlib

index = hnswlib.Index(space='l2', dim=d)
index.init_index(max_elements=N, M=16, ef_construction=200)
index.add_items(vectors)

Annoy (Spotify)

from annoy import AnnoyIndex

index = AnnoyIndex(d, 'angular')
for i, vec in enumerate(vectors):
    index.add_item(i, vec)
index.build(n_trees=10)

Vector Databases Using HNSW

Weaviate: Default ANN algorithm
Qdrant: One of the supported indexes
Milvus: HNSW index option
Vespa: HNSW for dense vectors

Optimization Techniques

1. Hierarchical Pruning

Instead of keeping all M neighbors, use a heuristic to maintain diversity:

def prune_connections(node, candidates, M):
    # Keep closest neighbor
    pruned = [nearest(candidates)]
    candidates.remove(nearest)
    
    # Add diverse neighbors
    while len(pruned) < M and candidates:
        # Select furthest from current set
        next_node = max(candidates, 
                       key=lambda c: min_dist_to_set(c, pruned))
        pruned.append(next_node)
        candidates.remove(next_node)
    
    return pruned

2. Prefetching

Improve cache locality by prefetching node data:

void search_with_prefetch(Node* current) {
    // Prefetch neighbors' data
    for (auto neighbor : current->neighbors) {
        __builtin_prefetch(neighbor, 0, 1);
    }
    
    // Process current node while prefetching
    process_node(current);
}

3. SIMD Optimization

Use SIMD instructions for distance calculations:

float euclidean_distance_avx(const float* a, const float* b, int d) {
    __m256 sum = _mm256_setzero_ps();
    for (int i = 0; i < d; i += 8) {
        __m256 va = _mm256_loadu_ps(a + i);
        __m256 vb = _mm256_loadu_ps(b + i);
        __m256 diff = _mm256_sub_ps(va, vb);
        sum = _mm256_fmadd_ps(diff, diff, sum);
    }
    // Horizontal sum
    return horizontal_sum_avx(sum);
}

When to Use HNSW

✅ Use HNSW When:

You need high recall (>95%)
Dataset size is moderate (millions of vectors)
Low query latency is critical
You need incremental updates
Memory is not severely constrained

❌ Avoid HNSW When:

Memory is very limited (consider IVF-PQ)
Dataset is small (<10K vectors, use exact search)
You need to delete vectors frequently
Extremely high dimensionality (>1000d)

Comparison with Other Methods

Algorithm	Build Time	Search Time	Memory	Recall	Use Case
HNSW	Slow	Very Fast	High	Excellent	Real-time search
IVF	Fast	Fast	Medium	Good	Large-scale
LSH	Very Fast	Fast	Low	Moderate	Streaming
Annoy	Medium	Fast	Medium	Good	Static data
ScaNN	Slow	Fast	Medium	Very Good	Google-scale

Advanced Topics

1. Filtered Search

Combining vector search with metadata filtering:

def filtered_search(query, filter_func, k):
    candidates = []
    visited = set()
    
    while len(candidates) < k * 10:  # Over-retrieve
        next_batch = search_layer(query, ef, 0, visited)
        filtered = [c for c in next_batch if filter_func(c.metadata)]
        candidates.extend(filtered)
        visited.update(next_batch)
    
    return top_k(candidates, k)

2. Hybrid Search

Combining HNSW with keyword search:

def hybrid_search(query_vector, query_text, alpha=0.5):
    # Vector search using HNSW
    vector_results = hnsw_index.search(query_vector, k=100)
    
    # Text search using BM25
    text_results = bm25_index.search(query_text, k=100)
    
    # Combine scores
    combined = {}
    for id, score in vector_results:
        combined[id] = alpha * score
    for id, score in text_results:
        combined[id] = combined.get(id, 0) + (1-alpha) * score
    
    return sorted(combined.items(), key=lambda x: x[1], reverse=True)[:k]

3. Quantization Integration

Combining HNSW with Product Quantization for memory efficiency:

class HNSW_PQ:
    def __init__(self, d, M, n_subvectors):
        self.hnsw = HNSW(d, M)
        self.pq = ProductQuantizer(d, n_subvectors)
    
    def add(self, vectors):
        # Store compressed vectors
        self.codes = self.pq.encode(vectors)
        
        # Build HNSW on PQ codes
        self.hnsw.add(self.codes)
    
    def search(self, query, k):
        # Compress query
        query_code = self.pq.encode(query)
        
        # Search in compressed space
        candidates = self.hnsw.search(query_code, k * 10)
        
        # Rerank with original vectors
        return rerank_with_exact_distance(query, candidates, k)

Best Practices

Parameter Tuning
- Start with M=16, ef_construction=200
- Increase ef_construction for better quality
- Increase M for better recall but more memory
Distance Metrics
- Use L2 for dense embeddings
- Use cosine for normalized embeddings
- Consider IP (inner product) for learned embeddings
Monitoring
- Track recall using ground truth
- Monitor query latency percentiles
- Watch memory usage growth
Scaling
- Shard by vector ID for horizontal scaling
- Use replicas for query throughput
- Consider quantization for large datasets

IVF-PQ - Clustering with quantization
LSH Search - Hash-based approach
Dense Embeddings - Vector representations
Vector Quantization - Compression techniques

References

Malkov & Yashunin. "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs"
HNSWlib Documentation
Faiss Wiki: HNSW
Weaviate: HNSW Index

Table of Contents

HNSW Algorithm Visualization

Multi-layer Graph Structure

Layer 2

Layer 1

Layer 0