Cross-Lingual Alignment

8 min

Align embeddings across languages for multilingual understanding

Best viewed on desktop for optimal interactive experience

Cross-Lingual Alignment

Cross-lingual alignment enables models to understand relationships between languages, making multilingual NLP possible without parallel data for every language pair. This technology powers machine translation, cross-lingual search, and zero-shot language transfer.

Interactive Alignment Explorer

Cross-Lingual Alignment

Align embeddings across languages for multilingual understanding

Alignment Configuration

85.9%

Bilingual Embedding Spaces

Source Language
Target Language
Alignment

Alignment Methods

VecMap

Unsupervised82%

Unsupervised orthogonal mapping

MUSE

Unsupervised85%

Adversarial training alignment

Supervised

Supervised90%

Dictionary-based alignment

XLM

Supervised93%

Multilingual masked language model

Cross-Lingual Similarity

gatoperrocasaaguasol
cat0.880.770.500.400.45
dog0.820.880.540.410.45
house0.560.620.880.540.48
water0.450.470.620.880.60
sun0.510.500.500.550.88

Implementation Example

Procrustes Alignment

import numpy as np

def procrustes_alignment(X_src, X_tgt):
    """
    Learn orthogonal mapping W
    that minimizes ||XW - Y||_F
    """
    # Center embeddings
    X_src = X_src - X_src.mean(0)
    X_tgt = X_tgt - X_tgt.mean(0)
    
    # Compute SVD
    U, S, Vt = np.linalg.svd(
        X_tgt.T @ X_src
    )
    
    # Orthogonal mapping
    W = U @ Vt
    
    return W

# Apply alignment
W = procrustes_alignment(src_emb, tgt_emb)
aligned_src = src_emb @ W

Adversarial Alignment

class AdversarialAligner(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.mapper = nn.Linear(dim, dim)
        self.discriminator = nn.Sequential(
            nn.Linear(dim, 128),
            nn.ReLU(),
            nn.Linear(128, 1)
        )
    
    def align(self, src_emb):
        return self.mapper(src_emb)
    
    def discriminate(self, emb):
        return self.discriminator(emb)

# Train with adversarial loss
mapped_src = aligner.align(src_emb)
disc_loss = discriminator_loss(
    mapped_src, tgt_emb
)

Cross-Lingual Alignment Best Practices

Key Techniques

  • • Use identical words as anchors
  • • Apply iterative refinement
  • • Normalize embeddings before alignment
  • • Use hub languages for zero-shot transfer
  • • Combine supervised and unsupervised methods

Applications

  • • Machine translation without parallel data
  • • Cross-lingual information retrieval
  • • Multilingual sentiment analysis
  • • Zero-shot language transfer
  • • Cross-lingual question answering

The Alignment Challenge

Why Alignment Matters

Different languages encode similar concepts in different vector spaces. Alignment techniques map these spaces to a common representation where:

  • Similar meanings have similar vectors across languages
  • Geometric relationships are preserved
  • Zero-shot transfer becomes possible

Alignment Methods

1. Supervised Alignment (Dictionary-Based)

Uses bilingual dictionaries to learn mappings:

def procrustes_alignment(X_src, X_tgt): """ Learn orthogonal mapping W that minimizes ||XW - Y||_F """ # Center embeddings X_src = X_src - X_src.mean(0) X_tgt = X_tgt - X_tgt.mean(0) # Compute SVD U, S, Vt = np.linalg.svd(X_tgt.T @ X_src) # Orthogonal mapping W = U @ Vt return W

2. Unsupervised Alignment (VecMap)

No parallel data required:

class VecMap: def __init__(self, src_emb, tgt_emb): self.src_emb = self.normalize(src_emb) self.tgt_emb = self.normalize(tgt_emb) def iterative_alignment(self, n_iter=10): """Self-learning through iterative refinement""" W = self.initialize_mapping() for i in range(n_iter): # Build dictionary using current mapping lexicon = self.build_lexicon(W) # Refine mapping using dictionary W = self.procrustes(lexicon) # Symmetric re-weighting W = self.symmetric_reweighting(W) return W

3. Adversarial Alignment (MUSE)

Uses adversarial training:

class AdversarialAligner(nn.Module): def __init__(self, dim): super().__init__() self.mapper = nn.Linear(dim, dim, bias=False) self.discriminator = nn.Sequential( nn.Linear(dim, 128), nn.ReLU(), nn.Linear(128, 1) ) def align_step(self, src_batch, tgt_batch): # Map source embeddings mapped_src = self.mapper(src_batch) # Discriminator tries to distinguish languages src_pred = self.discriminator(mapped_src) tgt_pred = self.discriminator(tgt_batch) # Adversarial loss disc_loss = -torch.mean(torch.log(tgt_pred) + torch.log(1 - src_pred)) # Mapping loss (fool discriminator) map_loss = -torch.mean(torch.log(src_pred)) return disc_loss, map_loss

Multilingual Models

mBERT and XLM-R

Pre-trained on multiple languages simultaneously:

class MultilingualBERT: def __init__(self, languages): self.languages = languages self.shared_vocabulary = self.build_shared_vocab() self.model = TransformerModel( vocab_size=len(self.shared_vocabulary), shared_positional=True ) def train_step(self, batch): # Mixed language batches loss = 0 for lang_data in batch: # Masked language modeling mlm_loss = self.masked_lm_loss(lang_data) # Translation language modeling (optional) if lang_data.has_parallel: tlm_loss = self.translation_lm_loss(lang_data) loss += tlm_loss loss += mlm_loss return loss

Evaluation Metrics

1. Bilingual Lexicon Induction (BLI)

Accuracy of word translation:

def bli_accuracy(src_words, tgt_words, W, k=1): """Compute P@k for word translation""" src_emb = get_embeddings(src_words) tgt_emb = get_embeddings(tgt_words) # Apply mapping mapped_src = src_emb @ W # Find nearest neighbors similarities = cosine_similarity(mapped_src, tgt_emb) predictions = np.argsort(-similarities, axis=1)[:, :k] # Compute accuracy correct = 0 for i, pred in enumerate(predictions): if i in pred: # Correct translation in top-k correct += 1 return correct / len(src_words)

2. Cross-Lingual Similarity

Correlation with human judgments:

def cross_lingual_similarity(pairs, W): """Evaluate on cross-lingual STS benchmarks""" predictions = [] gold_scores = [] for src_sent, tgt_sent, score in pairs: src_emb = encode_sentence(src_sent, 'source') tgt_emb = encode_sentence(tgt_sent, 'target') # Apply alignment aligned_src = src_emb @ W # Compute similarity sim = cosine_similarity(aligned_src, tgt_emb) predictions.append(sim) gold_scores.append(score) return spearmanr(predictions, gold_scores)

Applications

1. Zero-Shot Cross-Lingual Transfer

Train on one language, test on others:

def zero_shot_transfer(model, train_lang, test_langs): # Train on source language model.fit(train_data[train_lang]) results = {} for lang in test_langs: # Direct evaluation without fine-tuning results[lang] = model.evaluate(test_data[lang]) return results

2. Multilingual Information Retrieval

Search across languages:

class CrossLingualRetriever: def __init__(self, alignment_matrix): self.W = alignment_matrix self.index = None def index_documents(self, documents, languages): embeddings = [] for doc, lang in zip(documents, languages): emb = self.encode(doc, lang) if lang != 'english': # Align to English space emb = emb @ self.alignments[lang] embeddings.append(emb) self.index = faiss.IndexFlatIP(embeddings) def search(self, query, query_lang, k=10): query_emb = self.encode(query, query_lang) if query_lang != 'english': query_emb = query_emb @ self.alignments[query_lang] scores, indices = self.index.search(query_emb, k) return indices

Best Practices

1. Data Preparation

  • Comparable Corpora: Wikipedia in different languages
  • Anchor Points: Named entities, numbers, dates
  • Normalization: Lowercase, remove diacritics carefully

2. Training Strategies

  • Iterative Refinement: Start with simple methods, refine gradually
  • Hub Languages: Use English as pivot for low-resource pairs
  • Ensemble Methods: Combine multiple alignment techniques

3. Quality Control

  • Sanity Checks: Nearest neighbors should be translations
  • Symmetry: Alignment should work bidirectionally
  • Stability: Multiple runs should produce similar results

Advanced Techniques

Contextual Alignment

Align contextualized embeddings:

def align_contextual(bert_src, bert_tgt, parallel_sentences): """Align BERT models across languages""" alignments = [] for src_sent, tgt_sent in parallel_sentences: # Get contextual embeddings src_emb = bert_src.encode(src_sent) tgt_emb = bert_tgt.encode(tgt_sent) # Word alignment (e.g., using attention) alignment = compute_word_alignment( src_emb, tgt_emb, src_sent, tgt_sent ) alignments.append(alignment) # Learn global transformation W = learn_transformation(alignments) return W

Challenges and Future Directions

Current Challenges

  1. Distant Language Pairs: Typologically different languages
  2. Low-Resource Languages: Limited training data
  3. Domain Shift: Specialized terminology
  4. Evaluation: Beyond word translation accuracy

Emerging Approaches

  1. Neural Machine Translation: Using NMT for alignment
  2. Multilingual Transformers: Joint training from scratch
  3. Meta-Learning: Quick adaptation to new languages
  4. Graph-Based Methods: Leveraging knowledge graphs

Conclusion

Cross-lingual alignment bridges the gap between languages, enabling AI systems to transfer knowledge across linguistic boundaries. The interactive visualization demonstrates how different alignment methods map words and concepts between language spaces, making multilingual NLP accessible and effective.

If you found this explanation helpful, consider sharing it with others.

Mastodon