Dense Embeddings Space Explorer

8 min

Interactive visualization of high-dimensional vector spaces, word relationships, and semantic arithmetic operations.

Best viewed on desktop for optimal interactive experience

Dense Embeddings Space Explorer

Dense embeddings revolutionized NLP by representing words and sentences as continuous vectors in high-dimensional space, where semantic similarity corresponds to geometric proximity.

Interactive 3D Embedding Space

Dense Embeddings Space Explorer

Explore semantic relationships in high-dimensional vector spaces

Embedding Configuration

3D Embedding Space

Nearest Neighbors

1. princeroyalty
0.995
2. mangender
0.968
3. boygender
0.965
4. queenroyalty
0.943
5. princessroyalty
0.926

Understanding Dense Embeddings

Key Properties

  • • Continuous vector representations
  • • Capture semantic similarity
  • • Enable arithmetic operations
  • • Typically 50-1000 dimensions

Common Models

  • • Word2Vec (CBOW, Skip-gram)
  • • GloVe (Global Vectors)
  • • FastText (Subword)
  • • BERT (Contextual)

Applications

  • • Semantic search
  • • Document clustering
  • • Recommendation systems
  • • Machine translation

What Are Dense Embeddings?

Dense embeddings are continuous vector representations where:

  • Every dimension has a value (unlike sparse representations)
  • Semantic similarity = geometric proximity
  • Vector arithmetic captures relationships
  • Typically 50-1000 dimensions

Key Concepts

1. Word Embeddings Evolution

The progression of embedding techniques:

ModelYearKey InnovationDimensions
Word2Vec2013Skip-gram/CBOW50-300
GloVe2014Global matrix factorization50-300
FastText2016Subword information100-300
BERT2018Contextual embeddings768
GPT-32020Scale + few-shot12,288

2. Training Objectives

Different models use different objectives:

Word2Vec Skip-gram:

J(θ) = -1TΣt=1TΣ-c ≤ j ≤ c, j ≠ 0 log p(wt+j | wt)

GloVe:

J = Σi,j=1V f(Xij)(wiT \tilde{w}j + bi + \tilde{b}j - log Xij)2

3. Cosine Similarity

The standard metric for comparing embeddings:

\text{similarity}(u, v) = u · v‖u‖ · ‖v‖ = Σi=1n ui vi√(Σi=1n ui2) · √(Σi=1n vi2)

Vector Arithmetic

The Famous Analogy

The most celebrated property of word embeddings:

king - man + woman ≈ queen

This works because embeddings encode relationships:

  • king - man = royalty vector
  • Adding woman applies royalty to female
  • Result closest to queen

More Examples

# Relationships captured by arithmetic paris - france + italy ≈ rome bigger - big + small ≈ smaller walking - walk + swim ≈ swimming

Implementation Details

Creating Word Embeddings

import numpy as np from gensim.models import Word2Vec # Train Word2Vec sentences = [["cat", "sat", "mat"], ["dog", "stood", "rug"]] model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=1) # Skip-gram # Get embeddings cat_vector = model.wv['cat'] dog_vector = model.wv['dog'] # Compute similarity similarity = model.wv.similarity('cat', 'dog')

Finding Nearest Neighbors

def find_nearest(embedding, embeddings, k=5): """Find k nearest neighbors using cosine similarity""" similarities = [] for word, vec in embeddings.items(): sim = cosine_similarity(embedding, vec) similarities.append((word, sim)) # Sort by similarity similarities.sort(key=lambda x: x[1], reverse=True) return similarities[:k]

Sentence Embeddings

Moving from words to sentences:

Average Pooling

Simple but effective:

sentence_emb = np.mean([word_emb for word in sentence], axis=0)

Weighted Average

Using TF-IDF or importance weights:

weights = compute_tfidf(sentence) sentence_emb = np.average(word_embs, weights=weights, axis=0)

Sentence-BERT

Specialized models for sentence embeddings:

from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(sentences)

Applications

# Index documents doc_embeddings = model.encode(documents) # Search query_embedding = model.encode(query) similarities = cosine_similarity(query_embedding, doc_embeddings) top_k = np.argsort(similarities)[-k:]

2. Clustering

from sklearn.cluster import KMeans # Cluster embeddings kmeans = KMeans(n_clusters=10) clusters = kmeans.fit_predict(embeddings)

3. Classification

# Use embeddings as features X = np.array([get_embedding(text) for text in texts]) classifier = LogisticRegression() classifier.fit(X, labels)

Visualization Techniques

t-SNE Projection

Reduce dimensions for visualization:

from sklearn.manifold import TSNE tsne = TSNE(n_components=2, perplexity=30) embeddings_2d = tsne.fit_transform(embeddings)

UMAP

Faster alternative to t-SNE:

import umap reducer = umap.UMAP(n_components=2) embeddings_2d = reducer.fit_transform(embeddings)

Common Pitfalls

1. Bias in Embeddings

Word embeddings can encode societal biases:

# Problematic associations doctor - man + woman ≈ nurse # Gender bias programmer - man + woman ≈ homemaker # Occupation bias

2. Out-of-Vocabulary Words

Handling unknown words:

  • Use subword tokenization (FastText)
  • Fall back to character embeddings
  • Use contextual models (BERT)

3. Polysemy

Single vector per word loses context:

  • "bank" (financial) vs "bank" (river)
  • Solution: Contextual embeddings (BERT, GPT)

Performance Considerations

Memory Usage

  • Word2Vec: ~1GB for 1M words × 300 dims
  • BERT: ~400MB model + dynamic computation
  • Storage: Use float16 or quantization

Speed Optimization

# Batch operations similarities = np.dot(query_embs, doc_embs.T) # Approximate nearest neighbor from annoy import AnnoyIndex index = AnnoyIndex(embedding_dim, 'angular') for i, vec in enumerate(embeddings): index.add_item(i, vec) index.build(10) # 10 trees

Modern Developments

1. Contextual Embeddings

BERT and GPT models provide context-dependent embeddings:

from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') model = AutoModel.from_pretrained('bert-base-uncased') # Different embeddings for same word in different contexts inputs1 = tokenizer("The bank is closed", return_tensors="pt") inputs2 = tokenizer("The river bank is muddy", return_tensors="pt")

2. Multilingual Embeddings

Cross-lingual understanding:

  • mBERT: 104 languages
  • XLM-R: 100 languages
  • LaBSE: Language-agnostic sentence embeddings

3. Multimodal Embeddings

Combining text and vision:

  • CLIP: Text-image alignment
  • ALIGN: Noisy data training
  • Flamingo: Few-shot multimodal

Best Practices

  1. Choose the right model:

    • Static embeddings for speed
    • Contextual for accuracy
    • Domain-specific when available
  2. Normalize embeddings:

    normalized = embedding / np.linalg.norm(embedding)
  3. Use appropriate similarity metrics:

    • Cosine for normalized vectors
    • Euclidean for positional relationships
    • Dot product for efficiency
  4. Consider fine-tuning:

    • Domain adaptation improves performance
    • Contrastive learning for specific tasks

References

  • Mikolov et al. "Efficient Estimation of Word Representations in Vector Space"
  • Pennington et al. "GloVe: Global Vectors for Word Representation"
  • Devlin et al. "BERT: Pre-training of Deep Bidirectional Transformers"
  • Reimers & Gurevych "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks"

If you found this explanation helpful, consider sharing it with others.

Mastodon