Cross-Lingual Alignment
Align embeddings across languages for multilingual understanding
Best viewed on desktop for optimal interactive experience
Cross-Lingual Alignment
Cross-lingual alignment enables models to understand relationships between languages, making multilingual NLP possible without parallel data for every language pair. This technology powers machine translation, cross-lingual search, and zero-shot language transfer.
Interactive Alignment Explorer
Cross-Lingual Alignment
Align embeddings across languages for multilingual understanding
Alignment Configuration
Bilingual Embedding Spaces
Alignment Methods
VecMap
Unsupervised orthogonal mapping
MUSE
Adversarial training alignment
Supervised
Dictionary-based alignment
XLM
Multilingual masked language model
Cross-Lingual Similarity
gato | perro | casa | agua | sol | |
---|---|---|---|---|---|
cat | 0.88 | 0.77 | 0.50 | 0.40 | 0.45 |
dog | 0.82 | 0.88 | 0.54 | 0.41 | 0.45 |
house | 0.56 | 0.62 | 0.88 | 0.54 | 0.48 |
water | 0.45 | 0.47 | 0.62 | 0.88 | 0.60 |
sun | 0.51 | 0.50 | 0.50 | 0.55 | 0.88 |
Implementation Example
Procrustes Alignment
import numpy as np
def procrustes_alignment(X_src, X_tgt):
"""
Learn orthogonal mapping W
that minimizes ||XW - Y||_F
"""
# Center embeddings
X_src = X_src - X_src.mean(0)
X_tgt = X_tgt - X_tgt.mean(0)
# Compute SVD
U, S, Vt = np.linalg.svd(
X_tgt.T @ X_src
)
# Orthogonal mapping
W = U @ Vt
return W
# Apply alignment
W = procrustes_alignment(src_emb, tgt_emb)
aligned_src = src_emb @ W
Adversarial Alignment
class AdversarialAligner(nn.Module):
def __init__(self, dim):
super().__init__()
self.mapper = nn.Linear(dim, dim)
self.discriminator = nn.Sequential(
nn.Linear(dim, 128),
nn.ReLU(),
nn.Linear(128, 1)
)
def align(self, src_emb):
return self.mapper(src_emb)
def discriminate(self, emb):
return self.discriminator(emb)
# Train with adversarial loss
mapped_src = aligner.align(src_emb)
disc_loss = discriminator_loss(
mapped_src, tgt_emb
)
Cross-Lingual Alignment Best Practices
Key Techniques
- • Use identical words as anchors
- • Apply iterative refinement
- • Normalize embeddings before alignment
- • Use hub languages for zero-shot transfer
- • Combine supervised and unsupervised methods
Applications
- • Machine translation without parallel data
- • Cross-lingual information retrieval
- • Multilingual sentiment analysis
- • Zero-shot language transfer
- • Cross-lingual question answering
The Alignment Challenge
Why Alignment Matters
Different languages encode similar concepts in different vector spaces. Alignment techniques map these spaces to a common representation where:
- Similar meanings have similar vectors across languages
- Geometric relationships are preserved
- Zero-shot transfer becomes possible
Alignment Methods
1. Supervised Alignment (Dictionary-Based)
Uses bilingual dictionaries to learn mappings:
def procrustes_alignment(X_src, X_tgt): """ Learn orthogonal mapping W that minimizes ||XW - Y||_F """ # Center embeddings X_src = X_src - X_src.mean(0) X_tgt = X_tgt - X_tgt.mean(0) # Compute SVD U, S, Vt = np.linalg.svd(X_tgt.T @ X_src) # Orthogonal mapping W = U @ Vt return W
2. Unsupervised Alignment (VecMap)
No parallel data required:
class VecMap: def __init__(self, src_emb, tgt_emb): self.src_emb = self.normalize(src_emb) self.tgt_emb = self.normalize(tgt_emb) def iterative_alignment(self, n_iter=10): """Self-learning through iterative refinement""" W = self.initialize_mapping() for i in range(n_iter): # Build dictionary using current mapping lexicon = self.build_lexicon(W) # Refine mapping using dictionary W = self.procrustes(lexicon) # Symmetric re-weighting W = self.symmetric_reweighting(W) return W
3. Adversarial Alignment (MUSE)
Uses adversarial training:
class AdversarialAligner(nn.Module): def __init__(self, dim): super().__init__() self.mapper = nn.Linear(dim, dim, bias=False) self.discriminator = nn.Sequential( nn.Linear(dim, 128), nn.ReLU(), nn.Linear(128, 1) ) def align_step(self, src_batch, tgt_batch): # Map source embeddings mapped_src = self.mapper(src_batch) # Discriminator tries to distinguish languages src_pred = self.discriminator(mapped_src) tgt_pred = self.discriminator(tgt_batch) # Adversarial loss disc_loss = -torch.mean(torch.log(tgt_pred) + torch.log(1 - src_pred)) # Mapping loss (fool discriminator) map_loss = -torch.mean(torch.log(src_pred)) return disc_loss, map_loss
Multilingual Models
mBERT and XLM-R
Pre-trained on multiple languages simultaneously:
class MultilingualBERT: def __init__(self, languages): self.languages = languages self.shared_vocabulary = self.build_shared_vocab() self.model = TransformerModel( vocab_size=len(self.shared_vocabulary), shared_positional=True ) def train_step(self, batch): # Mixed language batches loss = 0 for lang_data in batch: # Masked language modeling mlm_loss = self.masked_lm_loss(lang_data) # Translation language modeling (optional) if lang_data.has_parallel: tlm_loss = self.translation_lm_loss(lang_data) loss += tlm_loss loss += mlm_loss return loss
Evaluation Metrics
1. Bilingual Lexicon Induction (BLI)
Accuracy of word translation:
def bli_accuracy(src_words, tgt_words, W, k=1): """Compute P@k for word translation""" src_emb = get_embeddings(src_words) tgt_emb = get_embeddings(tgt_words) # Apply mapping mapped_src = src_emb @ W # Find nearest neighbors similarities = cosine_similarity(mapped_src, tgt_emb) predictions = np.argsort(-similarities, axis=1)[:, :k] # Compute accuracy correct = 0 for i, pred in enumerate(predictions): if i in pred: # Correct translation in top-k correct += 1 return correct / len(src_words)
2. Cross-Lingual Similarity
Correlation with human judgments:
def cross_lingual_similarity(pairs, W): """Evaluate on cross-lingual STS benchmarks""" predictions = [] gold_scores = [] for src_sent, tgt_sent, score in pairs: src_emb = encode_sentence(src_sent, 'source') tgt_emb = encode_sentence(tgt_sent, 'target') # Apply alignment aligned_src = src_emb @ W # Compute similarity sim = cosine_similarity(aligned_src, tgt_emb) predictions.append(sim) gold_scores.append(score) return spearmanr(predictions, gold_scores)
Applications
1. Zero-Shot Cross-Lingual Transfer
Train on one language, test on others:
def zero_shot_transfer(model, train_lang, test_langs): # Train on source language model.fit(train_data[train_lang]) results = {} for lang in test_langs: # Direct evaluation without fine-tuning results[lang] = model.evaluate(test_data[lang]) return results
2. Multilingual Information Retrieval
Search across languages:
class CrossLingualRetriever: def __init__(self, alignment_matrix): self.W = alignment_matrix self.index = None def index_documents(self, documents, languages): embeddings = [] for doc, lang in zip(documents, languages): emb = self.encode(doc, lang) if lang != 'english': # Align to English space emb = emb @ self.alignments[lang] embeddings.append(emb) self.index = faiss.IndexFlatIP(embeddings) def search(self, query, query_lang, k=10): query_emb = self.encode(query, query_lang) if query_lang != 'english': query_emb = query_emb @ self.alignments[query_lang] scores, indices = self.index.search(query_emb, k) return indices
Best Practices
1. Data Preparation
- Comparable Corpora: Wikipedia in different languages
- Anchor Points: Named entities, numbers, dates
- Normalization: Lowercase, remove diacritics carefully
2. Training Strategies
- Iterative Refinement: Start with simple methods, refine gradually
- Hub Languages: Use English as pivot for low-resource pairs
- Ensemble Methods: Combine multiple alignment techniques
3. Quality Control
- Sanity Checks: Nearest neighbors should be translations
- Symmetry: Alignment should work bidirectionally
- Stability: Multiple runs should produce similar results
Advanced Techniques
Contextual Alignment
Align contextualized embeddings:
def align_contextual(bert_src, bert_tgt, parallel_sentences): """Align BERT models across languages""" alignments = [] for src_sent, tgt_sent in parallel_sentences: # Get contextual embeddings src_emb = bert_src.encode(src_sent) tgt_emb = bert_tgt.encode(tgt_sent) # Word alignment (e.g., using attention) alignment = compute_word_alignment( src_emb, tgt_emb, src_sent, tgt_sent ) alignments.append(alignment) # Learn global transformation W = learn_transformation(alignments) return W
Challenges and Future Directions
Current Challenges
- Distant Language Pairs: Typologically different languages
- Low-Resource Languages: Limited training data
- Domain Shift: Specialized terminology
- Evaluation: Beyond word translation accuracy
Emerging Approaches
- Neural Machine Translation: Using NMT for alignment
- Multilingual Transformers: Joint training from scratch
- Meta-Learning: Quick adaptation to new languages
- Graph-Based Methods: Leveraging knowledge graphs
Conclusion
Cross-lingual alignment bridges the gap between languages, enabling AI systems to transfer knowledge across linguistic boundaries. The interactive visualization demonstrates how different alignment methods map words and concepts between language spaces, making multilingual NLP accessible and effective.