Quantization Effects Simulator
Explore memory-accuracy trade-offs in embedding quantization from float32 to binary representations.
Clear explanations of core machine learning concepts, from foundational ideas to advanced techniques. Understand attention mechanisms, transformers, skip connections, and more.
Explore memory-accuracy trade-offs in embedding quantization from float32 to binary representations.
Compare lexical (BM25/TF-IDF) and semantic (BERT) retrieval approaches, understanding their trade-offs and hybrid strategies.
Interactive visualization of context window mechanisms in LLMs - sliding windows, expanding contexts, and attention patterns that define what models can "remember".
Interactive visualization of Flash Attention - the breakthrough algorithm that makes attention memory-efficient through tiling, recomputation, and kernel fusion.
Interactive visualization of key-value caching in LLMs - how caching transformer attention states enables efficient text generation without quadratic recomputation.
Interactive exploration of tokenization methods in LLMs - BPE, SentencePiece, and WordPiece. Understand how text becomes tokens that models can process.