Menu
About
Articles
Papers
Resume
Speaking
Uses
Consulting
Memory Tiling Optimization for Neural Networks
Visualization of memory tiling optimization technique for neural networks. The diagram compares standard memory access patterns that cause cache misses with optimized tiled access patterns that improve cache efficiency and reduce memory latency.
Memory Tiling Optimization (e.g., in torch.compile)
Before: Standard Convolution
Large Input Feature Map
(Requires loading entirely)
Compute Unit
Global GPU Memory
Load Entire Map
Large memory transfer per operation.
Potential for cache thrashing (if map `>` cache).
Cache Locality: Poor
After: Tiled Convolution
Full Map (in Global Mem)
Tile 1
L1 Cache
(Fast Access)
Tile 1
Global Memory
Load Tile
Compute
Small tiles loaded sequentially into L1 cache.
Computation reuses data within fast cache.
Cache Locality: Good
torch.compile
(Applies Tiling)
Mastodon