NVIDIA Device Files in /dev/
Understanding character devices, major/minor numbers, and the device file hierarchy created by NVIDIA drivers for GPU access in Linux.
Best viewed on desktop for optimal interactive experience
Overview
When the NVIDIA driver loads on a Linux system, it creates multiple character device files in /dev/ that serve as the interface between userspace applications and the GPU hardware. These device files represent different aspects of GPU functionality—from basic compute access to unified memory management to display control.
Understanding this device file structure is essential for containerization, permission management, and debugging GPU access issues. Whether you're configuring Docker containers, troubleshooting CUDA initialization errors (see GPU boot errors), or managing multi-GPU systems, knowing which device files do what is crucial.
Character Devices Explained
The c at the start of permissions (e.g., crw-rw-rw-) indicates a character device—a special file that provides unbuffered, direct access to hardware. Unlike block devices (used for disks), character devices handle data as a stream of characters.
The two numbers after the owner/group (e.g., 195, 0) are the major and minor device numbers:
- Major number (195): Identifies the driver handling this device (NVIDIA driver)
- Minor number (0, 1, 255): Identifies which specific device within that driver
$ ls -la /dev/nvidia* crw-rw-rw- 1 root root 195, 0 Nov 2 10:00 /dev/nvidia0 crw-rw-rw- 1 root root 195, 1 Nov 2 10:00 /dev/nvidia1 crw-rw-rw- 1 root root 195, 255 Nov 2 10:00 /dev/nvidiactl crw-rw-rw- 1 root root 195, 254 Nov 2 10:00 /dev/nvidia-modeset
Core GPU Devices
The following visualization shows the core GPU device files that provide direct access to individual GPUs and driver-wide operations:
/dev/nvidia0, /dev/nvidia1, ...
Purpose: Primary device files for individual GPUs. Each GPU in your system gets its own numbered device file.
Major/Minor: 195, N (where N = GPU index)
Module: nvidia.ko
Who uses it:
- CUDA runtime (libcuda.so) for kernel launches (see PyTorch kernel compilation)
- OpenGL/Vulkan drivers for graphics operations
- nvidia-smi for querying GPU information
- Any application accessing that specific GPU (e.g., TensorRT optimization)
What operations:
// Typical CUDA application flow int fd = open("/dev/nvidia0", O_RDWR); if (fd < 0) { perror("Failed to open GPU 0"); return -1; } // Device is now accessible for CUDA operations cudaSetDevice(0); // Uses /dev/nvidia0 internally cudaMalloc(&ptr, size);
Key operations:
- Memory allocation:
ioctl(fd, NVIDIA_ALLOC_MEMORY, ...) - Command submission:
ioctl(fd, NVIDIA_SUBMIT_CMD, ...) - Query status:
ioctl(fd, NVIDIA_QUERY_STATUS, ...)
Numbering: GPU device numbers correspond to PCI bus enumeration order, which may not match physical slot positions:
# Query GPU to device file mapping $ nvidia-smi --query-gpu=index,pci.bus_id,name --format=csv index, pci.bus_id, name 0, 00000000:01:00.0, NVIDIA RTX 4090 1, 00000000:02:00.0, NVIDIA RTX 4090 2, 00000000:41:00.0, Tesla T4 3, 00000000:42:00.0, Tesla T4 # GPU 0 → /dev/nvidia0 → PCIe 01:00.0 # GPU 1 → /dev/nvidia1 → PCIe 02:00.0
/dev/nvidiactl
Purpose: Control device for driver-wide operations that don't target a specific GPU. This is the first device opened by NVIDIA libraries to query system capabilities and enumerate available GPUs.
Major/Minor: 195, 255
Module: nvidia.ko
Required: Yes, for all GPU operations
Operations:
- GPU enumeration: Query how many GPUs are present
- Driver information: Get driver version, capabilities
- Context allocation: Create CUDA contexts before binding to specific GPU
- Global driver state: Query system-wide GPU information
// Typical initialization sequence int ctl_fd = open("/dev/nvidiactl", O_RDWR); // Query: How many GPUs are available? int num_gpus; ioctl(ctl_fd, NVIDIA_GET_NUM_GPUS, &num_gpus); // Get driver version struct nvidia_version ver; ioctl(ctl_fd, NVIDIA_GET_VERSION, &ver); // Then open specific GPU devices for (int i = 0; i < num_gpus; i++) { char path[32]; sprintf(path, "/dev/nvidia%d", i); open(path, O_RDWR); }
Why it's needed: Without /dev/nvidiactl, applications can't discover GPUs or initialize the driver. Even single-GPU CUDA programs need this device.
/dev/nvidia-modeset
Purpose: Kernel Mode Setting (KMS) interface for display management. Handles display resolution changes, monitor hotplug detection, and mode setting operations.
Major/Minor: 195, 254
Module: nvidia-modeset.ko
When needed:
- Desktop systems with displays connected to NVIDIA GPUs
- X11 or Wayland display servers
- Dynamically changing display resolutions
- Multi-monitor configurations
When NOT needed:
- Headless GPU compute servers (no displays)
- Pure CUDA/compute workloads
- Container environments without display
# Check if modeset is being used $ lsof /dev/nvidia-modeset COMMAND PID USER FD TYPE DEVICE Xorg 1234 root 10u CHR 195,254 ← X server using it gnome-sh 5678 user 8u CHR 195,254 ← GNOME Shell # For headless servers, this device often sits unused
Unified Memory Devices
The Unified Virtual Memory (UVM) subsystem provides automatic page migration between CPU and GPU memory:
/dev/nvidia-uvm
Purpose: Unified Virtual Memory (UVM) driver interface. Enables CUDA's unified memory feature where CPU and GPU share the same virtual address space with automatic page migration.
Major/Minor: 237, 0
Module: nvidia-uvm.ko
Required: Yes, for modern CUDA applications
Critical for:
cudaMallocManaged()- Unified memory allocationscudaMemAdvise()- Memory placement hintscudaMemPrefetch()- Explicit prefetching- PyTorch, TensorFlow (use unified memory internally)
- Modern CUDA applications expecting automatic memory management
// Using unified memory float *data; // This requires /dev/nvidia-uvm cudaMallocManaged(&data, size); // CPU can access directly data[0] = 1.0f; // GPU can access same pointer kernel<<<blocks, threads>>>(data); // Automatic page migration handled by UVM driver // Learn more: /concepts/gpu/page-migration
Container Requirement: CUDA containers must have access to /dev/nvidia-uvm. Without it, CUDA initialization fails with:
CUDA error: CUDA_ERROR_NOT_INITIALIZED cudaGetDeviceCount() failed: initialization error
/dev/nvidia-uvm-tools
Purpose: Debugging and profiling interface for UVM. Provides detailed statistics about page migrations, fault rates, and memory access patterns.
Major/Minor: 237, 1
Module: nvidia-uvm.ko
Required: No (optional for profiling)
Used by:
- NVIDIA Nsight profilers
- CUDA profiling tools (nvprof, nvidia-smi)
- Performance analysis tools querying UVM statistics
# Query UVM statistics $ nvidia-smi --query-gpu=memory.used,memory.free --format=csv $ cat /proc/driver/nvidia/gpus/*/information # Uses uvm-tools
DRI Devices (Direct Rendering Infrastructure)
The Direct Rendering Manager (DRM) provides standardized access to graphics hardware for both display output and headless compute:
/dev/dri/card0, /dev/dri/card1
Purpose: Standard Linux DRM (Direct Rendering Manager) device nodes for graphics output. These integrate NVIDIA GPUs with the standard Linux graphics stack.
Major/Minor: 226, N (where N = GPU index)
Module: nvidia-drm.ko
Permissions: Owned by group video, so users must be in the video group to access displays.
Used by:
- Wayland compositors (requires DRM)
- Modern display servers
- KMS-based framebuffer
/dev/dri/renderD128, /dev/dri/renderD129
Purpose: Render nodes for headless GPU compute. These provide GPU access without requiring display permissions, enabling non-privileged compute workloads.
Major/Minor: 226, 128+ (starting at 128)
Module: nvidia-drm.ko
Advantages:
- No need for X11 authentication
- Works in headless environments
- Group permissions via
rendergroup instead ofvideo - Security: compute without display access
# Render nodes are numbered starting at 128 # GPU 0 → renderD128 # GPU 1 → renderD129 # Add user to render group for compute access $ sudo usermod -aG render $USER # Now can run CUDA without display permissions
Capability Devices
Advanced GPU features like Multi-Instance GPU (MIG) and NVLink fabric management require specialized capability devices:
/dev/nvidia-caps/nvidia-cap1
Purpose: Multi-Instance GPU (MIG) capability device. MIG allows partitioning A100/H100 GPUs into smaller, isolated instances.
Major/Minor: 243, 1
Module: nvidia.ko
Permissions: Note the restrictive cr-------- permissions. Only root can access by default. Applications needing MIG access require specific capability grants.
# MIG is only available on A100, A30, H100, H200 GPUs $ nvidia-smi mig -lgi +----+--------+-------+---------+ | ID | MIG | Size | Devices | +----+--------+-------+---------+ | 0 | 1g.5gb| 5GB | nvidia0 | | 1 | 2g.10gb| 10GB | nvidia1 | +----+--------+-------+---------+ # Each MIG instance gets access to capability devices
/dev/nvidia-caps/nvidia-cap2
Purpose: NVIDIA Fabric Manager capability. Manages NVLink and NVSwitch fabric for multi-GPU communication in DGX systems.
Major/Minor: 243, 2
Module: nvidia.ko
When used: High-end multi-GPU systems with NVLink interconnect (DGX A100, DGX H100). Not needed for PCIe-only systems.
Container Integration
For containers to access GPUs, these device files must be exposed inside the container. Here's what different workloads need:
| Workload Type | Required Devices | Optional Devices |
|---|---|---|
| Basic CUDA Compute | /dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm | /dev/nvidia-uvm-tools |
| Multi-GPU CUDA | /dev/nvidia* (all GPUs) /dev/nvidiactl /dev/nvidia-uvm | /dev/nvidia-uvm-tools |
| Graphics (OpenGL) | /dev/nvidia0 /dev/nvidiactl /dev/dri/card0 /dev/dri/renderD128 | /dev/nvidia-modeset |
| Display Server | All of the above + /dev/nvidia-modeset | None |
| MIG Workload | /dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm /dev/nvidia-caps/* | None |
# Docker: Expose specific GPU devices $ docker run --rm \ --device=/dev/nvidia0 \ --device=/dev/nvidiactl \ --device=/dev/nvidia-uvm \ nvidia/cuda:12.6.0-base nvidia-smi # Better: Use nvidia-container-runtime (handles all devices) $ docker run --rm --gpus all nvidia/cuda:12.6.0-base nvidia-smi # Kubernetes: nvidia-device-plugin automatically mounts required devices $ kubectl run gpu-pod --image=nvidia/cuda:12.6.0-base --limits=nvidia.com/gpu=1
Permission Management
Device file permissions control who can access GPUs:
# Default: world-readable/writable (not secure!) $ ls -l /dev/nvidia0 crw-rw-rw- 1 root root 195, 0 /dev/nvidia0 # Better: Restrict to video group $ sudo chmod 660 /dev/nvidia0 $ sudo chown root:video /dev/nvidia0 crw-rw---- 1 root video 195, 0 /dev/nvidia0 # Add user to video group $ sudo usermod -aG video $USER # Make permanent via udev rule $ sudo tee /etc/udev/rules.d/70-nvidia.rules <<EOF KERNEL=="nvidia*", MODE="0660", GROUP="video" KERNEL=="nvidia_uvm*", MODE="0660", GROUP="video" EOF $ sudo udevadm control --reload-rules $ sudo udevadm trigger
Troubleshooting
Missing Device Files
Symptom: /dev/nvidia* files don't exist
$ ls /dev/nvidia* ls: cannot access '/dev/nvidia*': No such file or directory
Causes and solutions:
- Driver not loaded:
sudo modprobe nvidia(see kernel modules and GPU boot errors) - No GPU detected: Check
lspci | grep -i nvidia - udev not running:
sudo systemctl start udev - Manual creation needed:
sudo nvidia-modprobe
Permission Denied
# Error when opening device $ python -c "import torch; print(torch.cuda.is_available())" RuntimeError: CUDA error: no kernel image is available for execution on the device $ strace -e openat python -c "import torch" 2>&1 | grep nvidia openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = -1 EACCES (Permission denied) # Solution: Add to video group $ sudo usermod -aG video $USER $ newgrp video # Apply immediately
Manual Device Creation
Sometimes device files don't get created automatically (e.g., in minimal containers). You can create them manually:
# Create nvidiactl $ sudo mknod -m 666 /dev/nvidiactl c 195 255 # Create per-GPU devices (for 2 GPUs) $ sudo mknod -m 666 /dev/nvidia0 c 195 0 $ sudo mknod -m 666 /dev/nvidia1 c 195 1 # Create UVM device $ sudo mknod -m 666 /dev/nvidia-uvm c 237 0 # Or use nvidia-modprobe to do it automatically $ sudo nvidia-modprobe -c0 -u # Create nvidia0 and uvm
Summary
The NVIDIA device file hierarchy provides a layered interface to GPU functionality:
- Core compute:
/dev/nvidia*(per-GPU) and/dev/nvidiactl(global) for CUDA context management - Unified memory:
/dev/nvidia-uvmfor modern CUDA applications with automatic page migration - Display:
/dev/nvidia-modesetand/dev/dri/*for graphics output - Advanced features:
/dev/nvidia-caps/*for MIG and multi-GPU fabric management
For containerized GPU workloads, the minimum requirement is three devices: /dev/nvidia0, /dev/nvidiactl, and /dev/nvidia-uvm. The nvidia-container-runtime automatically handles device exposure, but understanding the underlying device structure is essential for debugging access issues, configuring permissions, and optimizing container deployments (especially for inference with TensorRT).
Related Concepts
To deepen your understanding of GPU system architecture:
- CUDA Context - Learn how CUDA contexts manage GPU resources and interact with device files
- Unified Memory - Understand the unified memory programming model enabled by
/dev/nvidia-uvm - Page Migration - Explore how data moves between CPU and GPU memory
- GPU Memory Hierarchy - Discover the complete memory architecture from device memory to caches
- NCCL Communication - Deep dive into multi-GPU communication patterns
- NVIDIA Persistence Daemon - Learn about keeping GPU initialized for faster startups
- Virtual Memory - Understand the virtual address space fundamentals
Related Articles
For comprehensive guides on GPU programming and troubleshooting:
- Solving GPU Boot Errors - Troubleshoot NVIDIA driver loading, initramfs conflicts, and modprobe issues
- How TensorRT Works - Deep dive into NVIDIA's inference optimization engine and containerized deployment
- Accelerating PyTorch Models - Explore kernel optimization and GPU programming with torch.compile
