Binary Embeddings

8 min

Ultra-compact 1-bit representations for massive-scale retrieval

Best viewed on desktop for optimal interactive experience

Binary Embeddings

Binary embeddings compress floating-point vectors into 1-bit representations, achieving 32× memory reduction while maintaining surprisingly good retrieval quality. This extreme quantization enables billion-scale vector search on commodity hardware.

Interactive Binary Quantization

Binary Embeddings

Ultra-compact 1-bit representations for massive-scale retrieval

Binarization Configuration

1287682048

Sign Binarization

Simple sign function: b = sign(x)

b_i = 1 if x_i ≥ 0, else -1

Binary Representation Visualization

Positive values → 1
Negative values → 0
Binary bit

Similarity Preservation

Query: Query: "machine learning"
"deep learning algorithms"
Original0.998
Binary1.000
Hamming Distance: 0Preservation: 100.2%
"natural language processing"
Original0.853
Binary1.000
Hamming Distance: 0Preservation: 117.2%
"computer vision models"
Original-0.858
Binary0.000
Hamming Distance: 12Preservation: 0.0%

Memory Efficiency

32×
Compression
96.9%
Memory Saved
32→1
Bits/Value
1M vectors
Float32:2930 MB
Binary:92 MB
10M vectors
Float32:29297 MB
Binary:916 MB
100M vectors
Float32:292969 MB
Binary:9155 MB

Implementation Examples

Binary Hashing

import numpy as np

def binary_hash(embeddings):
    """Convert float embeddings to binary"""
    # Simple sign binarization
    binary = (embeddings >= 0).astype(np.uint8)
    
    # Pack bits efficiently
    n, d = binary.shape
    packed = np.packbits(binary, axis=1)
    
    return packed

def hamming_search(query_binary, db_binary, k=10):
    """Fast Hamming distance search"""
    # XOR for Hamming distance
    xor = np.bitwise_xor(query_binary, db_binary)
    
    # Count set bits
    distances = np.unpackbits(xor, axis=1).sum(axis=1)
    
    # Get top-k
    top_k = np.argpartition(distances, k)[:k]
    return top_k[np.argsort(distances[top_k])]

ITQ Learning

def learn_itq(X, n_bits, n_iter=50):
    """Iterative Quantization"""
    n, d = X.shape
    
    # PCA to n_bits dimensions
    X_pca = PCA(n_components=n_bits).fit_transform(X)
    
    # Initialize rotation as identity
    R = np.eye(n_bits)
    
    for _ in range(n_iter):
        # Fix R, update B
        B = np.sign(X_pca @ R)
        
        # Fix B, update R (Procrustes)
        U, _, Vt = np.linalg.svd(B.T @ X_pca)
        R = Vt.T @ U.T
    
    return R

# Apply ITQ
R = learn_itq(embeddings, n_bits=128)
binary_codes = np.sign(embeddings @ R)

Binary Embeddings Trade-offs

Advantages

  • • 32× memory reduction
  • • Ultra-fast Hamming distance
  • • CPU-friendly (bit operations)
  • • Scales to billions of vectors
  • • Simple implementation

Limitations

  • • 20-40% accuracy loss
  • • Limited expressiveness
  • • Irreversible compression
  • • Requires careful method selection
  • • May need reranking

Best for: Massive-scale approximate search, memory-constrained environments, first-stage retrieval in cascaded systems, and duplicate detection at scale.

Why Binary?

The Scale Challenge

Modern retrieval systems face enormous scale:

DatasetVectorsFloat32 SizeBinary SizeSavings
1M docs1M × 7683 GB96 MB32×
Wikipedia10M × 76830 GB960 MB32×
Web-scale1B × 7683 TB96 GB32×
Internet100B × 768300 TB9.6 TB32×

Performance Benefits

  1. Memory Efficiency: 32× reduction (float32 → 1 bit)
  2. Cache Friendly: More vectors fit in CPU cache
  3. SIMD Operations: Efficient bit-parallel operations
  4. Network Transfer: Reduced bandwidth requirements

Binarization Methods

1. Sign Binarization

The simplest approach:

def sign_binarize(embeddings): """Convert float embeddings to binary""" # Simple thresholding at zero binary = (embeddings >= 0).astype(np.uint8) return binary def pack_bits(binary_matrix): """Pack 8 bits into single byte""" n, d = binary_matrix.shape # Pad to multiple of 8 pad_size = (8 - d % 8) % 8 if pad_size: binary_matrix = np.pad(binary_matrix, ((0, 0), (0, pad_size))) # Pack bits packed = np.packbits(binary_matrix, axis=1) return packed

2. Iterative Quantization (ITQ)

Learns optimal rotation for binarization:

class ITQ: def __init__(self, n_bits): self.n_bits = n_bits self.rotation = None self.mean = None def fit(self, X, n_iter=50): """Learn ITQ rotation matrix""" n, d = X.shape # Center data self.mean = X.mean(axis=0) X_centered = X - self.mean # PCA to n_bits dimensions U, S, Vt = np.linalg.svd(X_centered, full_matrices=False) X_pca = U[:, :self.n_bits] * S[:self.n_bits] # Initialize rotation R = np.eye(self.n_bits) for _ in range(n_iter): # Fix R, update B B = np.sign(X_pca @ R) # Fix B, update R (Procrustes) U_svd, _, Vt_svd = np.linalg.svd(B.T @ X_pca) R = Vt_svd.T @ U_svd.T self.rotation = Vt[:self.n_bits].T @ R return self def transform(self, X): """Apply ITQ transformation""" X_centered = X - self.mean X_rotated = X_centered @ self.rotation return np.sign(X_rotated)

3. Locality Sensitive Hashing (LSH)

Random projections for similarity preservation:

class LSH: def __init__(self, n_bits, input_dim): self.n_bits = n_bits # Random hyperplanes self.projections = np.random.randn(input_dim, n_bits) self.projections /= np.linalg.norm( self.projections, axis=0, keepdims=True ) def hash(self, X): """Compute LSH binary codes""" # Project onto random hyperplanes projections = X @ self.projections # Binarize binary_codes = (projections >= 0).astype(np.uint8) return binary_codes def multi_hash(self, X, n_tables=4): """Multiple hash tables for better recall""" hash_codes = [] for i in range(n_tables): # Different random projections np.random.seed(i) self.projections = np.random.randn( X.shape[1], self.n_bits ) hash_codes.append(self.hash(X)) return hash_codes

4. Learning to Hash

Neural networks for end-to-end optimization:

class DeepHash(nn.Module): def __init__(self, input_dim, hash_bits): super().__init__() self.encoder = nn.Sequential( nn.Linear(input_dim, 512), nn.ReLU(), nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, hash_bits) ) self.alpha = 1.0 # Tanh saturation parameter def forward(self, x): # Encode to continuous h = self.encoder(x) # Smooth binarization with tanh b = torch.tanh(self.alpha * h) # Straight-through estimator for backprop if self.training: b_hard = torch.sign(h) b = b_hard.detach() - b.detach() + b else: b = torch.sign(h) return b def hash_loss(self, b1, b2, similarity): """Pairwise hashing loss""" # Hamming distance distance = 0.5 * (self.hash_bits - torch.sum(b1 * b2, dim=1)) # Convert similarity to target distance target_distance = self.hash_bits * (1 - similarity) / 2 # L2 loss loss = torch.mean((distance - target_distance) ** 2) return loss

Distance Computation

Hamming Distance

Count differing bits:

def hamming_distance(b1, b2): """Compute Hamming distance between binary codes""" # XOR to find differing bits xor = np.bitwise_xor(b1, b2) # Count set bits (popcount) if len(xor.shape) == 1: # Single vector return np.unpackbits(xor).sum() else: # Matrix of vectors return np.unpackbits(xor, axis=1).sum(axis=1) def fast_hamming_search(query, database, k=10): """Fast k-NN search with Hamming distance""" # Compute distances to all database vectors distances = hamming_distance(query, database) # Get top-k indices top_k_indices = np.argpartition(distances, k)[:k] top_k_indices = top_k_indices[np.argsort(distances[top_k_indices])] return top_k_indices, distances[top_k_indices]

Asymmetric Distance

Keep query in full precision:

class AsymmetricBinarySearch: def __init__(self, database_binary, database_float): self.database_binary = database_binary self.database_float = database_float self.rotation = None def search(self, query_float, k=10, rerank_factor=10): """Two-stage retrieval with reranking""" # Stage 1: Binary search (coarse) query_binary = self.binarize(query_float) candidates_idx = fast_hamming_search( query_binary, self.database_binary, k * rerank_factor )[0] # Stage 2: Rerank with float precision candidates = self.database_float[candidates_idx] scores = cosine_similarity(query_float, candidates) top_k = np.argsort(-scores)[:k] return candidates_idx[top_k]

Optimization Techniques

1. Bit Packing

Efficient storage and operations:

def optimize_binary_storage(binary_codes): """Optimize binary code storage""" n, d = binary_codes.shape # Pack into uint64 for 64-bit operations n_uint64 = (d + 63) // 64 packed = np.zeros((n, n_uint64), dtype=np.uint64) for i in range(n_uint64): start = i * 64 end = min(start + 64, d) # Pack 64 bits into single uint64 for j in range(end - start): packed[:, i] |= (binary_codes[:, start + j] << j) return packed def popcount_uint64(x): """Fast population count for uint64""" # Brian Kernighan's algorithm count = 0 while x: x &= x - 1 count += 1 return count

2. SIMD Acceleration

Leverage CPU vector instructions:

// C++ SIMD implementation #include <immintrin.h> int hamming_distance_avx2( const uint64_t* a, const uint64_t* b, size_t n_uint64 ) { __m256i sum = _mm256_setzero_si256(); for (size_t i = 0; i < n_uint64; i += 4) { __m256i va = _mm256_loadu_si256((__m256i*)(a + i)); __m256i vb = _mm256_loadu_si256((__m256i*)(b + i)); __m256i vxor = _mm256_xor_si256(va, vb); // Population count __m256i popcnt = _mm256_popcnt_epi64(vxor); sum = _mm256_add_epi64(sum, popcnt); } // Horizontal sum uint64_t result[4]; _mm256_storeu_si256((__m256i*)result, sum); return result[0] + result[1] + result[2] + result[3]; }

Evaluation Metrics

Retrieval Quality

def evaluate_binary_retrieval(binary_codes, float_embeddings, queries, k=10): """Evaluate binary embedding quality""" metrics = {} # Recall@k for query in queries: # Ground truth (float) true_neighbors = get_knn(query, float_embeddings, k) # Binary retrieval binary_neighbors = get_knn_binary(query, binary_codes, k) # Compute recall recall = len(set(true_neighbors) & set(binary_neighbors)) / k metrics['recall@k'] = recall # Mean Average Precision metrics['map'] = compute_map(binary_codes, float_embeddings) # Compression ratio float_size = float_embeddings.nbytes binary_size = binary_codes.nbytes / 8 # bits to bytes metrics['compression'] = float_size / binary_size return metrics

Best Practices

1. Training Strategies

  • Balanced Bits: Ensure roughly 50% ones and zeros
  • Orthogonal Codes: Minimize correlation between bits
  • Preserve Neighbors: Maintain local structure

2. System Design

class BinaryRetrievalSystem: def __init__(self, n_bits=256, rerank_top=100): self.n_bits = n_bits self.rerank_top = rerank_top self.binary_index = None self.float_cache = {} def index(self, embeddings, method='itq'): """Build binary index""" # Learn binarization if method == 'itq': binarizer = ITQ(self.n_bits) binarizer.fit(embeddings) binary_codes = binarizer.transform(embeddings) else: binary_codes = sign_binarize(embeddings) # Pack bits self.binary_index = pack_bits(binary_codes) # Cache top documents in float self.cache_popular(embeddings) def search(self, query, k=10): """Hybrid search with reranking""" # Stage 1: Binary retrieval candidates = self.binary_search(query, self.rerank_top) # Stage 2: Rerank if in cache if self.float_cache: candidates = self.rerank(query, candidates, k) return candidates[:k]

Applications

1. Large-Scale Image Retrieval

  • Billions of images
  • Real-time requirements
  • Mobile deployment

2. Duplicate Detection

  • Near-duplicate documents
  • Copyright detection
  • Spam filtering

3. Recommendation Systems

  • User-item matching
  • Content deduplication
  • Candidate generation

Limitations and Trade-offs

Accuracy Loss

  • 20-40% recall drop vs float32
  • Poor for fine-grained similarity
  • Limited expressiveness

Mitigation Strategies

  1. Longer Codes: 256-1024 bits
  2. Multiple Tables: Improve recall
  3. Hybrid Systems: Binary + reranking
  4. Learned Representations: Task-specific optimization

Conclusion

Binary embeddings offer an extreme but practical trade-off between memory efficiency and retrieval quality. The 32× compression enables web-scale deployments that would be impossible with full-precision vectors.

The interactive visualization demonstrates how different binarization methods preserve similarity structure while dramatically reducing storage requirements. For applications where approximate retrieval is acceptable, binary embeddings provide an elegant solution to the scale challenge.

If you found this explanation helpful, consider sharing it with others.

Mastodon