Large Language Models
Deep dive into the architecture, optimization, and engineering of large language models. From tokenization to attention mechanisms, understand how LLMs work under the hood.
4
Core Concepts
45min
Total Reading
Interactive
Visualizations
Practical
Code Examples
Available Concepts
Tokenization
Converting text to numbers - BPE, SentencePiece, and WordPiece tokenization methods.
Beginner8 min read
Context Windows
The memory limits of LLMs - sliding windows, attention patterns, and scaling strategies.
Intermediate10 min read
KV Cache
The secret to fast inference - caching key-value states for efficient text generation.
Advanced12 min read
Flash Attention
IO-aware exact attention - achieving massive speedups through tiling and kernel fusion.
Advanced15 min read
Suggested Learning Path
1
TokenizationStart with understanding how text becomes numbers
2
Context WindowsLearn about attention patterns and memory limits
3
KV CacheUnderstand inference optimization through caching
4
Flash AttentionMaster advanced GPU optimization techniques