Binary Embeddings
Ultra-compact 1-bit representations for massive-scale retrieval
Best viewed on desktop for optimal interactive experience
Binary Embeddings
Binary embeddings compress floating-point vectors into 1-bit representations, achieving 32× memory reduction while maintaining surprisingly good retrieval quality. This extreme quantization enables billion-scale vector search on commodity hardware.
Interactive Binary Quantization
Binary Embeddings
Ultra-compact 1-bit representations for massive-scale retrieval
Binarization Configuration
Sign Binarization
Simple sign function: b = sign(x)
Binary Representation Visualization
Similarity Preservation
Memory Efficiency
Implementation Examples
Binary Hashing
import numpy as np
def binary_hash(embeddings):
"""Convert float embeddings to binary"""
# Simple sign binarization
binary = (embeddings >= 0).astype(np.uint8)
# Pack bits efficiently
n, d = binary.shape
packed = np.packbits(binary, axis=1)
return packed
def hamming_search(query_binary, db_binary, k=10):
"""Fast Hamming distance search"""
# XOR for Hamming distance
xor = np.bitwise_xor(query_binary, db_binary)
# Count set bits
distances = np.unpackbits(xor, axis=1).sum(axis=1)
# Get top-k
top_k = np.argpartition(distances, k)[:k]
return top_k[np.argsort(distances[top_k])]
ITQ Learning
def learn_itq(X, n_bits, n_iter=50):
"""Iterative Quantization"""
n, d = X.shape
# PCA to n_bits dimensions
X_pca = PCA(n_components=n_bits).fit_transform(X)
# Initialize rotation as identity
R = np.eye(n_bits)
for _ in range(n_iter):
# Fix R, update B
B = np.sign(X_pca @ R)
# Fix B, update R (Procrustes)
U, _, Vt = np.linalg.svd(B.T @ X_pca)
R = Vt.T @ U.T
return R
# Apply ITQ
R = learn_itq(embeddings, n_bits=128)
binary_codes = np.sign(embeddings @ R)
Binary Embeddings Trade-offs
Advantages
- • 32× memory reduction
- • Ultra-fast Hamming distance
- • CPU-friendly (bit operations)
- • Scales to billions of vectors
- • Simple implementation
Limitations
- • 20-40% accuracy loss
- • Limited expressiveness
- • Irreversible compression
- • Requires careful method selection
- • May need reranking
Best for: Massive-scale approximate search, memory-constrained environments, first-stage retrieval in cascaded systems, and duplicate detection at scale.
Why Binary?
The Scale Challenge
Modern retrieval systems face enormous scale:
Dataset | Vectors | Float32 Size | Binary Size | Savings |
---|---|---|---|---|
1M docs | 1M × 768 | 3 GB | 96 MB | 32× |
Wikipedia | 10M × 768 | 30 GB | 960 MB | 32× |
Web-scale | 1B × 768 | 3 TB | 96 GB | 32× |
Internet | 100B × 768 | 300 TB | 9.6 TB | 32× |
Performance Benefits
- Memory Efficiency: 32× reduction (float32 → 1 bit)
- Cache Friendly: More vectors fit in CPU cache
- SIMD Operations: Efficient bit-parallel operations
- Network Transfer: Reduced bandwidth requirements
Binarization Methods
1. Sign Binarization
The simplest approach:
def sign_binarize(embeddings): """Convert float embeddings to binary""" # Simple thresholding at zero binary = (embeddings >= 0).astype(np.uint8) return binary def pack_bits(binary_matrix): """Pack 8 bits into single byte""" n, d = binary_matrix.shape # Pad to multiple of 8 pad_size = (8 - d % 8) % 8 if pad_size: binary_matrix = np.pad(binary_matrix, ((0, 0), (0, pad_size))) # Pack bits packed = np.packbits(binary_matrix, axis=1) return packed
2. Iterative Quantization (ITQ)
Learns optimal rotation for binarization:
class ITQ: def __init__(self, n_bits): self.n_bits = n_bits self.rotation = None self.mean = None def fit(self, X, n_iter=50): """Learn ITQ rotation matrix""" n, d = X.shape # Center data self.mean = X.mean(axis=0) X_centered = X - self.mean # PCA to n_bits dimensions U, S, Vt = np.linalg.svd(X_centered, full_matrices=False) X_pca = U[:, :self.n_bits] * S[:self.n_bits] # Initialize rotation R = np.eye(self.n_bits) for _ in range(n_iter): # Fix R, update B B = np.sign(X_pca @ R) # Fix B, update R (Procrustes) U_svd, _, Vt_svd = np.linalg.svd(B.T @ X_pca) R = Vt_svd.T @ U_svd.T self.rotation = Vt[:self.n_bits].T @ R return self def transform(self, X): """Apply ITQ transformation""" X_centered = X - self.mean X_rotated = X_centered @ self.rotation return np.sign(X_rotated)
3. Locality Sensitive Hashing (LSH)
Random projections for similarity preservation:
class LSH: def __init__(self, n_bits, input_dim): self.n_bits = n_bits # Random hyperplanes self.projections = np.random.randn(input_dim, n_bits) self.projections /= np.linalg.norm( self.projections, axis=0, keepdims=True ) def hash(self, X): """Compute LSH binary codes""" # Project onto random hyperplanes projections = X @ self.projections # Binarize binary_codes = (projections >= 0).astype(np.uint8) return binary_codes def multi_hash(self, X, n_tables=4): """Multiple hash tables for better recall""" hash_codes = [] for i in range(n_tables): # Different random projections np.random.seed(i) self.projections = np.random.randn( X.shape[1], self.n_bits ) hash_codes.append(self.hash(X)) return hash_codes
4. Learning to Hash
Neural networks for end-to-end optimization:
class DeepHash(nn.Module): def __init__(self, input_dim, hash_bits): super().__init__() self.encoder = nn.Sequential( nn.Linear(input_dim, 512), nn.ReLU(), nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, hash_bits) ) self.alpha = 1.0 # Tanh saturation parameter def forward(self, x): # Encode to continuous h = self.encoder(x) # Smooth binarization with tanh b = torch.tanh(self.alpha * h) # Straight-through estimator for backprop if self.training: b_hard = torch.sign(h) b = b_hard.detach() - b.detach() + b else: b = torch.sign(h) return b def hash_loss(self, b1, b2, similarity): """Pairwise hashing loss""" # Hamming distance distance = 0.5 * (self.hash_bits - torch.sum(b1 * b2, dim=1)) # Convert similarity to target distance target_distance = self.hash_bits * (1 - similarity) / 2 # L2 loss loss = torch.mean((distance - target_distance) ** 2) return loss
Distance Computation
Hamming Distance
Count differing bits:
def hamming_distance(b1, b2): """Compute Hamming distance between binary codes""" # XOR to find differing bits xor = np.bitwise_xor(b1, b2) # Count set bits (popcount) if len(xor.shape) == 1: # Single vector return np.unpackbits(xor).sum() else: # Matrix of vectors return np.unpackbits(xor, axis=1).sum(axis=1) def fast_hamming_search(query, database, k=10): """Fast k-NN search with Hamming distance""" # Compute distances to all database vectors distances = hamming_distance(query, database) # Get top-k indices top_k_indices = np.argpartition(distances, k)[:k] top_k_indices = top_k_indices[np.argsort(distances[top_k_indices])] return top_k_indices, distances[top_k_indices]
Asymmetric Distance
Keep query in full precision:
class AsymmetricBinarySearch: def __init__(self, database_binary, database_float): self.database_binary = database_binary self.database_float = database_float self.rotation = None def search(self, query_float, k=10, rerank_factor=10): """Two-stage retrieval with reranking""" # Stage 1: Binary search (coarse) query_binary = self.binarize(query_float) candidates_idx = fast_hamming_search( query_binary, self.database_binary, k * rerank_factor )[0] # Stage 2: Rerank with float precision candidates = self.database_float[candidates_idx] scores = cosine_similarity(query_float, candidates) top_k = np.argsort(-scores)[:k] return candidates_idx[top_k]
Optimization Techniques
1. Bit Packing
Efficient storage and operations:
def optimize_binary_storage(binary_codes): """Optimize binary code storage""" n, d = binary_codes.shape # Pack into uint64 for 64-bit operations n_uint64 = (d + 63) // 64 packed = np.zeros((n, n_uint64), dtype=np.uint64) for i in range(n_uint64): start = i * 64 end = min(start + 64, d) # Pack 64 bits into single uint64 for j in range(end - start): packed[:, i] |= (binary_codes[:, start + j] << j) return packed def popcount_uint64(x): """Fast population count for uint64""" # Brian Kernighan's algorithm count = 0 while x: x &= x - 1 count += 1 return count
2. SIMD Acceleration
Leverage CPU vector instructions:
// C++ SIMD implementation #include <immintrin.h> int hamming_distance_avx2( const uint64_t* a, const uint64_t* b, size_t n_uint64 ) { __m256i sum = _mm256_setzero_si256(); for (size_t i = 0; i < n_uint64; i += 4) { __m256i va = _mm256_loadu_si256((__m256i*)(a + i)); __m256i vb = _mm256_loadu_si256((__m256i*)(b + i)); __m256i vxor = _mm256_xor_si256(va, vb); // Population count __m256i popcnt = _mm256_popcnt_epi64(vxor); sum = _mm256_add_epi64(sum, popcnt); } // Horizontal sum uint64_t result[4]; _mm256_storeu_si256((__m256i*)result, sum); return result[0] + result[1] + result[2] + result[3]; }
Evaluation Metrics
Retrieval Quality
def evaluate_binary_retrieval(binary_codes, float_embeddings, queries, k=10): """Evaluate binary embedding quality""" metrics = {} # Recall@k for query in queries: # Ground truth (float) true_neighbors = get_knn(query, float_embeddings, k) # Binary retrieval binary_neighbors = get_knn_binary(query, binary_codes, k) # Compute recall recall = len(set(true_neighbors) & set(binary_neighbors)) / k metrics['recall@k'] = recall # Mean Average Precision metrics['map'] = compute_map(binary_codes, float_embeddings) # Compression ratio float_size = float_embeddings.nbytes binary_size = binary_codes.nbytes / 8 # bits to bytes metrics['compression'] = float_size / binary_size return metrics
Best Practices
1. Training Strategies
- Balanced Bits: Ensure roughly 50% ones and zeros
- Orthogonal Codes: Minimize correlation between bits
- Preserve Neighbors: Maintain local structure
2. System Design
class BinaryRetrievalSystem: def __init__(self, n_bits=256, rerank_top=100): self.n_bits = n_bits self.rerank_top = rerank_top self.binary_index = None self.float_cache = {} def index(self, embeddings, method='itq'): """Build binary index""" # Learn binarization if method == 'itq': binarizer = ITQ(self.n_bits) binarizer.fit(embeddings) binary_codes = binarizer.transform(embeddings) else: binary_codes = sign_binarize(embeddings) # Pack bits self.binary_index = pack_bits(binary_codes) # Cache top documents in float self.cache_popular(embeddings) def search(self, query, k=10): """Hybrid search with reranking""" # Stage 1: Binary retrieval candidates = self.binary_search(query, self.rerank_top) # Stage 2: Rerank if in cache if self.float_cache: candidates = self.rerank(query, candidates, k) return candidates[:k]
Applications
1. Large-Scale Image Retrieval
- Billions of images
- Real-time requirements
- Mobile deployment
2. Duplicate Detection
- Near-duplicate documents
- Copyright detection
- Spam filtering
3. Recommendation Systems
- User-item matching
- Content deduplication
- Candidate generation
Limitations and Trade-offs
Accuracy Loss
- 20-40% recall drop vs float32
- Poor for fine-grained similarity
- Limited expressiveness
Mitigation Strategies
- Longer Codes: 256-1024 bits
- Multiple Tables: Improve recall
- Hybrid Systems: Binary + reranking
- Learned Representations: Task-specific optimization
Conclusion
Binary embeddings offer an extreme but practical trade-off between memory efficiency and retrieval quality. The 32× compression enables web-scale deployments that would be impossible with full-precision vectors.
The interactive visualization demonstrates how different binarization methods preserve similarity structure while dramatically reducing storage requirements. For applications where approximate retrieval is acceptable, binary embeddings provide an elegant solution to the scale challenge.