IVF-PQ: Inverted File with Product Quantization

IVF-PQ is a powerful technique that combines clustering-based indexing (IVF) with vector compression (PQ) to enable searching billions of vectors on a single machine with limited memory.

Interactive IVF-PQ Visualization

Explore how IVF-PQ partitions space into clusters and compresses vectors using product quantization:

IVF-PQ Visualization

Inverted File with Product Quantization - Scalable vector compression and search

Inverted File Clustering

Clusters: 8

The Billion-Scale Challenge

Storing 1 billion 768-dimensional float32 vectors requires:

10⁹ × 768 × 4 \text{ bytes} = 3.072 \text{ TB}

IVF-PQ can reduce this to just 30-60 GB while maintaining 90%+ recall!

How IVF-PQ Works

Step 1: Inverted File Index (IVF)

Partition the vector space into clusters using k-means:

def build_ivf_index(vectors, n_clusters):
    # Train k-means on sample
    centroids = kmeans(vectors[:100000], n_clusters)
    
    # Assign vectors to nearest centroid
    inverted_lists = [[] for _ in range(n_clusters)]
    for i, vector in enumerate(vectors):
        cluster_id = nearest_centroid(vector, centroids)
        inverted_lists[cluster_id].append(i)
    
    return centroids, inverted_lists

Step 2: Product Quantization (PQ)

Compress vectors by splitting into subvectors and quantizing:

x ∈ ℝ^d → [q₁, q₂, ..., q_m] ∈ \{0,1,...,255\}^m

def product_quantize(vector, codebooks):
    m = len(codebooks)  # Number of subvectors
    d_sub = len(vector) // m
    
    codes = []
    for i in range(m):
        subvector = vector[i*d_sub:(i+1)*d_sub]
        # Find nearest codeword
        code = nearest_codeword(subvector, codebooks[i])
        codes.append(code)
    
    return codes  # m bytes instead of d*4 bytes

Step 3: Search Process

Find candidate clusters using probe parameter
Scan compressed vectors in selected clusters
Compute distances using lookup tables
Return top-k results

Mathematical Foundation

Clustering Objective

K-means minimizes within-cluster variance:

J = Σ_i=1ⁿ Σ_j=1^k r_ij ‖x_i - μ_j‖²

Where:

r_ij is 1 if vector i belongs to cluster j
μ_j is the centroid of cluster j

Quantization Error

Product quantization minimizes reconstruction error:

E = 𝔼[‖x - x̂‖²] = Σ_j=1^m 𝔼[‖x^(j) - c_{q_j}^(j)‖²]

Where:

x^(j) is the j-th subvector
c_{q_j}^(j) is the assigned codeword

Asymmetric Distance Computation

For efficiency, compute distances without decompressing:

d(q, x) ≈ Σ_j=1^m ‖q^(j) - c_{code_j}^(j)‖²

Pre-compute distance tables for each subvector of the query.

Key Parameters

IVF Parameters

Parameter	Typical Value	Effect
nlist	√N to N/100	Number of clusters
nprobe	1-128	Clusters to search
train_size	30×nlist	Training set size

PQ Parameters

Parameter	Typical Value	Effect
M	8-64	Number of subvectors
nbits	8	Bits per subquantizer
codebook_size	2^nbits	Codewords per subspace

Performance Analysis

Memory Savings

Original size vs compressed size:

Vectors	Dimension	Original	IVF-PQ (M=32)	Compression
1M	128	512 MB	32 MB	16×
10M	512	20 GB	320 MB	64×
100M	768	300 GB	3.2 GB	94×
1B	768	3 TB	32 GB	96×

Recall vs Speed Tradeoff

nprobe | Recall | QPS    | Latency
-------|--------|--------|----------
1      | 70%    | 50,000 | 0.02ms
4      | 85%    | 20,000 | 0.05ms
16     | 93%    | 5,000  | 0.2ms
64     | 97%    | 1,250  | 0.8ms
256    | 99%    | 300    | 3.3ms

Implementation with Faiss

Basic IVF-PQ Index

import faiss
import numpy as np

def create_ivf_pq_index(vectors, nlist=1024, m=32):
    d = vectors.shape[1]
    
    # Create quantizer (coarse quantizer)
    quantizer = faiss.IndexFlatL2(d)
    
    # Create IVF-PQ index
    index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)
    
    # Train on subset
    train_size = max(nlist * 30, 100000)
    train_data = vectors[np.random.choice(len(vectors), train_size)]
    index.train(train_data)
    
    # Add vectors
    index.add(vectors)
    
    return index

Optimized Search

def search_ivf_pq(index, queries, k=10, nprobe=16):
    # Set search parameters
    index.nprobe = nprobe
    
    # Search
    distances, indices = index.search(queries, k)
    
    return indices, distances

# Example usage
index = create_ivf_pq_index(database_vectors)
results, scores = search_ivf_pq(index, query_vectors)

Advanced Configuration

# IVF-PQ with preprocessing
index = faiss.index_factory(
    d, 
    "OPQ32,IVF1024,PQ32x8", 
    faiss.METRIC_L2
)

# Components:
# - OPQ32: Optimized Product Quantization rotation
# - IVF1024: 1024 clusters
# - PQ32x8: 32 subvectors, 8 bits each

Optimization Strategies

1. Optimized Product Quantization (OPQ)

Apply rotation to minimize quantization error:

def train_opq(vectors, m):
    # Learn rotation matrix R
    R = learn_rotation_matrix(vectors, m)
    
    # Rotate vectors
    rotated_vectors = vectors @ R
    
    # Train PQ on rotated space
    pq = train_product_quantizer(rotated_vectors, m)
    
    return R, pq

2. Multi-Index Hashing

Use multiple hash tables for better recall:

class MultiIndexIVF:
    def __init__(self, d, n_tables=4):
        self.tables = []
        for _ in range(n_tables):
            # Different random projections
            self.tables.append(create_ivf_index(d))
    
    def search(self, query, k):
        candidates = set()
        for table in self.tables:
            results = table.search(query, k * 2)
            candidates.update(results)
        
        # Rerank candidates
        return rerank(query, candidates, k)

3. GPU Acceleration

# Move index to GPU
gpu_res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(gpu_res, 0, cpu_index)

# Batch search on GPU
batch_size = 1000
results = gpu_index.search(queries[:batch_size], k)

Comparison with Other Methods

IVF-PQ vs HNSW

Aspect	IVF-PQ	HNSW
Memory	Very Low (compressed)	High (original vectors)
Build Time	Fast	Slow
Search Speed	Fast	Very Fast
Recall	Good (90-95%)	Excellent (95-99%)
Scalability	Billions	Millions
Updates	Batch preferred	Incremental

When to Use Each

Use IVF-PQ when:

Dataset has billions of vectors
Memory is severely limited
Moderate recall (90%) is acceptable
Batch updates are feasible

Use HNSW when:

Need highest recall (>95%)
Dataset fits in memory
Real-time updates required
Latency is critical

Advanced Topics

1. Hierarchical IVF

Two-level clustering for very large datasets:

class HierarchicalIVF:
    def __init__(self, d, n_coarse=100, n_fine=100):
        self.coarse_centroids = None
        self.fine_centroids = {}  # Per coarse cluster
        
    def build(self, vectors):
        # Level 1: Coarse clustering
        self.coarse_centroids = kmeans(vectors, self.n_coarse)
        
        # Level 2: Fine clustering within each coarse cluster
        for i in range(self.n_coarse):
            cluster_vectors = get_cluster_vectors(vectors, i)
            self.fine_centroids[i] = kmeans(cluster_vectors, self.n_fine)

2. Learned Quantization

Using neural networks for better compression:

class LearnedPQ(nn.Module):
    def __init__(self, d, m, k):
        super().__init__()
        self.encoders = nn.ModuleList([
            nn.Linear(d // m, k) for _ in range(m)
        ])
        self.decoders = nn.ModuleList([
            nn.Linear(k, d // m) for _ in range(m)
        ])
    
    def forward(self, x):
        # Encode subvectors
        codes = []
        for i, encoder in enumerate(self.encoders):
            subvec = x[:, i*d//m:(i+1)*d//m]
            code = encoder(subvec)
            codes.append(code)
        
        # Decode for reconstruction
        reconstructed = []
        for code, decoder in zip(codes, self.decoders):
            reconstructed.append(decoder(code))
        
        return torch.cat(reconstructed, dim=1)

3. Filtering and Metadata

Combining vector search with attribute filtering:

class FilteredIVFPQ:
    def search_with_filter(self, query, filter_func, k):
        # Increase nprobe to compensate for filtering
        self.index.nprobe = self.nprobe * 4
        
        # Over-retrieve
        candidates, distances = self.index.search(query, k * 10)
        
        # Apply filter
        filtered_results = []
        for idx, dist in zip(candidates[0], distances[0]):
            if filter_func(self.metadata[idx]):
                filtered_results.append((idx, dist))
                if len(filtered_results) >= k:
                    break
        
        return filtered_results

Best Practices

1. Training Data Selection

Use representative sample (not random)
Include edge cases and outliers
At least 30×nlist vectors for training

2. Parameter Tuning

def tune_ivf_pq(vectors, queries, ground_truth):
    best_params = {}
    best_score = 0
    
    for nlist in [256, 512, 1024, 2048]:
        for m in [8, 16, 32, 64]:
            for nprobe in [1, 4, 16, 64]:
                index = create_index(vectors, nlist, m)
                index.nprobe = nprobe
                
                recall = compute_recall(index, queries, ground_truth)
                qps = measure_qps(index, queries)
                
                score = recall * np.log(qps)  # Balance recall and speed
                if score > best_score:
                    best_score = score
                    best_params = {'nlist': nlist, 'm': m, 'nprobe': nprobe}
    
    return best_params

3. Monitoring and Debugging

Track quantization error during training
Monitor cluster imbalance
Measure actual compression ratio
Validate recall on test set

Real-World Applications

1. Image Search (Pinterest)

3 billion images
768d embeddings from CNN
IVF4096,PQ32 configuration
95% recall at 100ms latency

2. Semantic Search (Spotify)

100M+ songs
Audio embeddings
IVF2048,PQ64 with OPQ
Real-time recommendations

3. Document Retrieval (Elastic)

Billions of documents
BERT embeddings (768d)
Hierarchical IVF-PQ
Sub-second search

HNSW Search - Graph-based alternative
LSH Search - Hash-based approach
Vector Quantization - Compression techniques
Dense Embeddings - Vector representations

References

Jégou et al. "Product Quantization for Nearest Neighbor Search"
Faiss: A Library for Efficient Similarity Search
Billion-scale similarity search with GPUs
The Inverted Multi-Index

Table of Contents

IVF-PQ Visualization

Inverted File Clustering