High Bandwidth Memory (HBM)
3D-stacked DRAM architecture providing massive bandwidth for GPUs and AI accelerators
Explore machine learning concepts related to gpu. Clear explanations and practical insights.
3D-stacked DRAM architecture providing massive bandwidth for GPUs and AI accelerators
Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance
Automatic memory management between CPU and GPU through page faulting and on-demand migration
Understanding virtual memory page migration, fault handling, and TLB management in CPU-GPU systems
Understanding character devices, major/minor numbers, and the device file hierarchy created by NVIDIA drivers for GPU access in Linux.
Deep dive into nvidia-modeset, the NVIDIA kernel module that handles display mode-setting, monitor configuration, and DRM integration in Linux systems.
Understand NVIDIA CUDA Multi-Process Service (MPS), a client-server architecture that enables multiple CUDA processes to share a single GPU context for concurrent kernel execution and better utilization.
Interactive visualization of Flash Attention - the breakthrough algorithm that makes attention memory-efficient through tiling, recomputation, and kernel fusion.
Explore how the NVIDIA GPU Operator automates GPU infrastructure management in Kubernetes, transforming manual GPU setup into a declarative, cloud-native system.
Explore the concept of CUDA contexts, their role in managing GPU resources, and how they enable parallel execution across multiple CPU threads.
Master Structure of Arrays (SoA) vs Array of Structures (AoS) data layouts for optimal cache efficiency, SIMD vectorization, and GPU memory coalescing with interactive visualizations.
Eliminating GPU initialization latency through nvidia-persistenced - a userspace daemon that maintains GPU driver state for optimal startup performance.
Understanding NVIDIA's Collective Communications Library for distributed deep learning and multi-GPU training
Understanding NVIDIA's specialized matrix multiplication hardware for AI workloads
Deep dive into the fundamental processing unit of modern GPUs - the Streaming Multiprocessor architecture, execution model, and memory hierarchy