Large Language Models

Deep dive into the architecture, optimization, and engineering of large language models. From tokenization to attention mechanisms, understand how LLMs work under the hood.

Core Concepts

45min

Total Reading

Interactive

Visualizations

Practical

Code Examples

Available Concepts

Tokenization

Converting text to numbers - BPE, SentencePiece, and WordPiece tokenization methods.

Beginner8 min read

Context Windows

The memory limits of LLMs - sliding windows, attention patterns, and scaling strategies.

Intermediate10 min read

KV Cache

The secret to fast inference - caching key-value states for efficient text generation.

Advanced12 min read

Flash Attention

IO-aware exact attention - achieving massive speedups through tiling and kernel fusion.

Advanced15 min read

Suggested Learning Path

TokenizationStart with understanding how text becomes numbers

Context WindowsLearn about attention patterns and memory limits

KV CacheUnderstand inference optimization through caching

Flash AttentionMaster advanced GPU optimization techniques

Related Categories

Attention Mechanisms Deep Learning Embeddings GPU Computing