Tagged with

gpu

Explore machine learning concepts related to gpu. Clear explanations and practical insights.

Concepts Found

Back to all concepts

Concepts Related to gpu

August 16, 2024

High Bandwidth Memory (HBM)

3D-stacked DRAM architecture providing massive bandwidth for GPUs and AI accelerators

hbm memory gpu bandwidth 3d-stacking tsv ai-hardware

11 min readConcept

August 17, 2025

GPU Memory Hierarchy & Optimization

Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance

GPU CUDA memory-optimization performance parallel-computing HBM cache

8 min readConcept

January 15, 2025

NVIDIA Unified Virtual Memory

Automatic memory management between CPU and GPU through page faulting and on-demand migration

cuda unified-memory gpu page-migration memory-management virtual-memory uvm nvidia

12 min readConcept

August 16, 2024

Page Migration & Fault Handling

Understanding virtual memory page migration, fault handling, and TLB management in CPU-GPU systems

page-migration page-fault virtual-memory tlb gpu memory-management

9 min readConcept

November 2, 2025

NVIDIA Device Files in /dev/

Understanding character devices, major/minor numbers, and the device file hierarchy created by NVIDIA drivers for GPU access in Linux.

gpu hardware linux nvidia device-files cuda containers

11 min readConcept

Understanding nvidia-modeset: Kernel Mode-Setting for NVIDIA GPUs

Deep dive into nvidia-modeset, the NVIDIA kernel module that handles display mode-setting, monitor configuration, and DRM integration in Linux systems.

linux kernel nvidia gpu display drm kms

11 min readConcept

November 2, 2025

CUDA Multi-Process Service (MPS)

Understand NVIDIA CUDA Multi-Process Service (MPS), a client-server architecture that enables multiple CUDA processes to share a single GPU context for concurrent kernel execution and better utilization.

hardware gpu cuda parallelism optimization

7 min readConcept

January 21, 2025

Flash Attention: IO-Aware Exact Attention

Interactive visualization of Flash Attention - the breakthrough algorithm that makes attention memory-efficient through tiling, recomputation, and kernel fusion.

llms optimization attention gpu

7 min readConcept

April 15, 2025

Understanding NVIDIA Kubernetes GPU Operator

Explore how the NVIDIA GPU Operator automates GPU infrastructure management in Kubernetes, transforming manual GPU setup into a declarative, cloud-native system.

hardware gpu kubernetes infrastructure automation

8 min readConcept

April 10, 2025

Understanding CUDA Contexts

Explore the concept of CUDA contexts, their role in managing GPU resources, and how they enable parallel execution across multiple CPU threads.

hardware gpu programming parallelism

2 min readConcept

January 31, 2025

SoA vs AoS: Data Layout Optimization

Master Structure of Arrays (SoA) vs Array of Structures (AoS) data layouts for optimal cache efficiency, SIMD vectorization, and GPU memory coalescing with interactive visualizations.

performance memory optimization SIMD GPU cache

11 min readConcept

January 30, 2025

Understanding NVIDIA Persistence Daemon

Eliminating GPU initialization latency through nvidia-persistenced - a userspace daemon that maintains GPU driver state for optimal startup performance.

gpu nvidia performance driver optimization

11 min readConcept

January 15, 2025