Attention Sinks: Stable Streaming LLMs
Understand attention sinks, the phenomenon where LLMs concentrate attention on initial tokens, and how preserving them enables infinite-length streaming inference.
17 min readConcept
Explore machine learning concepts related to streaming. Clear explanations and practical insights.
Understand attention sinks, the phenomenon where LLMs concentrate attention on initial tokens, and how preserving them enables infinite-length streaming inference.
Explore how LSH uses probabilistic hash functions to find similar vectors in sub-linear time, perfect for streaming and high-dimensional data.