NVIDIA Device Files in /dev/

Overview

When the NVIDIA driver loads on a Linux system, it creates multiple character device files in /dev/ that serve as the interface between userspace applications and the GPU hardware. These device files represent different aspects of GPU functionality—from basic compute access to unified memory management to display control.

Understanding this device file structure is essential for containerization, permission management, and debugging GPU access issues. Whether you're configuring Docker containers, troubleshooting CUDA initialization errors (see GPU boot errors), or managing multi-GPU systems, knowing which device files do what is crucial.

Character Devices Explained

The c at the start of permissions (e.g., crw-rw-rw-) indicates a character device—a special file that provides unbuffered, direct access to hardware. Unlike block devices (used for disks), character devices handle data as a stream of characters.

The two numbers after the owner/group (e.g., 195, 0) are the major and minor device numbers:

Major number (195): Identifies the driver handling this device (NVIDIA driver)
Minor number (0, 1, 255): Identifies which specific device within that driver

$ ls -la /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Nov  2 10:00 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 Nov  2 10:00 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Nov  2 10:00 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Nov  2 10:00 /dev/nvidia-modeset

Core GPU Devices

The following visualization shows the core GPU device files that provide direct access to individual GPUs and driver-wide operations:

/dev/nvidia0, /dev/nvidia1, ...

Purpose: Primary device files for individual GPUs. Each GPU in your system gets its own numbered device file.

Major/Minor: 195, N (where N = GPU index)

Module: nvidia.ko

Who uses it:

CUDA runtime (libcuda.so) for kernel launches (see PyTorch kernel compilation)
OpenGL/Vulkan drivers for graphics operations
nvidia-smi for querying GPU information
Any application accessing that specific GPU (e.g., TensorRT optimization)

What operations:

// Typical CUDA application flow
int fd = open("/dev/nvidia0", O_RDWR);
if (fd < 0) {
    perror("Failed to open GPU 0");
    return -1;
}

// Device is now accessible for CUDA operations
cudaSetDevice(0);  // Uses /dev/nvidia0 internally
cudaMalloc(&ptr, size);

Key operations:

Memory allocation: ioctl(fd, NVIDIA_ALLOC_MEMORY, ...)
Command submission: ioctl(fd, NVIDIA_SUBMIT_CMD, ...)
Query status: ioctl(fd, NVIDIA_QUERY_STATUS, ...)

Numbering: GPU device numbers correspond to PCI bus enumeration order, which may not match physical slot positions:

# Query GPU to device file mapping
$ nvidia-smi --query-gpu=index,pci.bus_id,name --format=csv
index, pci.bus_id, name
0, 00000000:01:00.0, NVIDIA RTX 4090
1, 00000000:02:00.0, NVIDIA RTX 4090
2, 00000000:41:00.0, Tesla T4
3, 00000000:42:00.0, Tesla T4

# GPU 0 → /dev/nvidia0 → PCIe 01:00.0
# GPU 1 → /dev/nvidia1 → PCIe 02:00.0

/dev/nvidiactl

Purpose: Control device for driver-wide operations that don't target a specific GPU. This is the first device opened by NVIDIA libraries to query system capabilities and enumerate available GPUs.

Major/Minor: 195, 255

Module: nvidia.ko

Required: Yes, for all GPU operations

Operations:

GPU enumeration: Query how many GPUs are present
Driver information: Get driver version, capabilities
Context allocation: Create CUDA contexts before binding to specific GPU
Global driver state: Query system-wide GPU information

// Typical initialization sequence
int ctl_fd = open("/dev/nvidiactl", O_RDWR);

// Query: How many GPUs are available?
int num_gpus;
ioctl(ctl_fd, NVIDIA_GET_NUM_GPUS, &num_gpus);

// Get driver version
struct nvidia_version ver;
ioctl(ctl_fd, NVIDIA_GET_VERSION, &ver);

// Then open specific GPU devices
for (int i = 0; i < num_gpus; i++) {
    char path[32];
    sprintf(path, "/dev/nvidia%d", i);
    open(path, O_RDWR);
}

Why it's needed: Without /dev/nvidiactl, applications can't discover GPUs or initialize the driver. Even single-GPU CUDA programs need this device.

/dev/nvidia-modeset

Purpose: Kernel Mode Setting (KMS) interface for display management. Handles display resolution changes, monitor hotplug detection, and mode setting operations.

Major/Minor: 195, 254

Module: nvidia-modeset.ko

When needed:

Desktop systems with displays connected to NVIDIA GPUs
X11 or Wayland display servers
Dynamically changing display resolutions
Multi-monitor configurations

When NOT needed:

Headless GPU compute servers (no displays)
Pure CUDA/compute workloads
Container environments without display

# Check if modeset is being used
$ lsof /dev/nvidia-modeset
COMMAND    PID USER   FD   TYPE DEVICE
Xorg      1234 root   10u   CHR  195,254  ← X server using it
gnome-sh  5678 user    8u   CHR  195,254  ← GNOME Shell

# For headless servers, this device often sits unused

Unified Memory Devices

The Unified Virtual Memory (UVM) subsystem provides automatic page migration between CPU and GPU memory:

/dev/nvidia-uvm

Purpose: Unified Virtual Memory (UVM) driver interface. Enables CUDA's unified memory feature where CPU and GPU share the same virtual address space with automatic page migration.

Major/Minor: 237, 0

Module: nvidia-uvm.ko

Required: Yes, for modern CUDA applications

Critical for:

cudaMallocManaged() - Unified memory allocations
cudaMemAdvise() - Memory placement hints
cudaMemPrefetch() - Explicit prefetching
PyTorch, TensorFlow (use unified memory internally)
Modern CUDA applications expecting automatic memory management

// Using unified memory
float *data;

// This requires /dev/nvidia-uvm
cudaMallocManaged(&data, size);

// CPU can access directly
data[0] = 1.0f;

// GPU can access same pointer
kernel<<<blocks, threads>>>(data);

// Automatic page migration handled by UVM driver
// Learn more: /concepts/gpu/page-migration

Container Requirement: CUDA containers must have access to /dev/nvidia-uvm. Without it, CUDA initialization fails with:

CUDA error: CUDA_ERROR_NOT_INITIALIZED
cudaGetDeviceCount() failed: initialization error

/dev/nvidia-uvm-tools

Purpose: Debugging and profiling interface for UVM. Provides detailed statistics about page migrations, fault rates, and memory access patterns.

Major/Minor: 237, 1

Module: nvidia-uvm.ko

Required: No (optional for profiling)

Used by:

NVIDIA Nsight profilers
CUDA profiling tools (nvprof, nvidia-smi)
Performance analysis tools querying UVM statistics

# Query UVM statistics
$ nvidia-smi --query-gpu=memory.used,memory.free --format=csv
$ cat /proc/driver/nvidia/gpus/*/information  # Uses uvm-tools

DRI Devices (Direct Rendering Infrastructure)

The Direct Rendering Manager (DRM) provides standardized access to graphics hardware for both display output and headless compute:

/dev/dri/card0, /dev/dri/card1

Purpose: Standard Linux DRM (Direct Rendering Manager) device nodes for graphics output. These integrate NVIDIA GPUs with the standard Linux graphics stack.

Major/Minor: 226, N (where N = GPU index)

Module: nvidia-drm.ko

Permissions: Owned by group video, so users must be in the video group to access displays.

Used by:

Wayland compositors (requires DRM)
Modern display servers
KMS-based framebuffer

/dev/dri/renderD128, /dev/dri/renderD129

Purpose: Render nodes for headless GPU compute. These provide GPU access without requiring display permissions, enabling non-privileged compute workloads.

Major/Minor: 226, 128+ (starting at 128)

Module: nvidia-drm.ko

Advantages:

No need for X11 authentication
Works in headless environments
Group permissions via render group instead of video
Security: compute without display access

# Render nodes are numbered starting at 128
# GPU 0 → renderD128
# GPU 1 → renderD129

# Add user to render group for compute access
$ sudo usermod -aG render $USER

# Now can run CUDA without display permissions

Capability Devices

Advanced GPU features like Multi-Instance GPU (MIG) and NVLink fabric management require specialized capability devices:

/dev/nvidia-caps/nvidia-cap1

Purpose: Multi-Instance GPU (MIG) capability device. MIG allows partitioning A100/H100 GPUs into smaller, isolated instances.

Major/Minor: 243, 1

Module: nvidia.ko

Permissions: Note the restrictive cr-------- permissions. Only root can access by default. Applications needing MIG access require specific capability grants.

# MIG is only available on A100, A30, H100, H200 GPUs
$ nvidia-smi mig -lgi
+----+--------+-------+---------+
| ID |   MIG  | Size  | Devices |
+----+--------+-------+---------+
|  0 |  1g.5gb|   5GB | nvidia0 |
|  1 |  2g.10gb| 10GB | nvidia1 |
+----+--------+-------+---------+

# Each MIG instance gets access to capability devices

/dev/nvidia-caps/nvidia-cap2

Purpose: NVIDIA Fabric Manager capability. Manages NVLink and NVSwitch fabric for multi-GPU communication in DGX systems.

Major/Minor: 243, 2

Module: nvidia.ko

When used: High-end multi-GPU systems with NVLink interconnect (DGX A100, DGX H100). Not needed for PCIe-only systems.

Container Integration

For containers to access GPUs, these device files must be exposed inside the container. Here's what different workloads need:

Workload Type	Required Devices	Optional Devices
Basic CUDA Compute	`/dev/nvidia0` `/dev/nvidiactl` `/dev/nvidia-uvm`	`/dev/nvidia-uvm-tools`
Multi-GPU CUDA	`/dev/nvidia*` (all GPUs) `/dev/nvidiactl` `/dev/nvidia-uvm`	`/dev/nvidia-uvm-tools`
Graphics (OpenGL)	`/dev/nvidia0` `/dev/nvidiactl` `/dev/dri/card0` `/dev/dri/renderD128`	`/dev/nvidia-modeset`
Display Server	All of the above + `/dev/nvidia-modeset`	None
MIG Workload	`/dev/nvidia0` `/dev/nvidiactl` `/dev/nvidia-uvm` `/dev/nvidia-caps/*`	None

# Docker: Expose specific GPU devices
$ docker run --rm \
    --device=/dev/nvidia0 \
    --device=/dev/nvidiactl \
    --device=/dev/nvidia-uvm \
    nvidia/cuda:12.6.0-base nvidia-smi

# Better: Use nvidia-container-runtime (handles all devices)
$ docker run --rm --gpus all nvidia/cuda:12.6.0-base nvidia-smi

# Kubernetes: nvidia-device-plugin automatically mounts required devices
$ kubectl run gpu-pod --image=nvidia/cuda:12.6.0-base --limits=nvidia.com/gpu=1

Permission Management

Device file permissions control who can access GPUs:

# Default: world-readable/writable (not secure!)
$ ls -l /dev/nvidia0
crw-rw-rw- 1 root root 195, 0 /dev/nvidia0

# Better: Restrict to video group
$ sudo chmod 660 /dev/nvidia0
$ sudo chown root:video /dev/nvidia0
crw-rw---- 1 root video 195, 0 /dev/nvidia0

# Add user to video group
$ sudo usermod -aG video $USER

# Make permanent via udev rule
$ sudo tee /etc/udev/rules.d/70-nvidia.rules <<EOF
KERNEL=="nvidia*", MODE="0660", GROUP="video"
KERNEL=="nvidia_uvm*", MODE="0660", GROUP="video"
EOF

$ sudo udevadm control --reload-rules
$ sudo udevadm trigger

Troubleshooting

Missing Device Files

Symptom: /dev/nvidia* files don't exist

$ ls /dev/nvidia*
ls: cannot access '/dev/nvidia*': No such file or directory

Causes and solutions:

Driver not loaded: sudo modprobe nvidia (see kernel modules and GPU boot errors)
No GPU detected: Check lspci | grep -i nvidia
udev not running: sudo systemctl start udev
Manual creation needed: sudo nvidia-modprobe

Permission Denied

# Error when opening device
$ python -c "import torch; print(torch.cuda.is_available())"
RuntimeError: CUDA error: no kernel image is available for execution on the device

$ strace -e openat python -c "import torch" 2>&1 | grep nvidia
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = -1 EACCES (Permission denied)

# Solution: Add to video group
$ sudo usermod -aG video $USER
$ newgrp video  # Apply immediately

Manual Device Creation

Sometimes device files don't get created automatically (e.g., in minimal containers). You can create them manually:

# Create nvidiactl
$ sudo mknod -m 666 /dev/nvidiactl c 195 255

# Create per-GPU devices (for 2 GPUs)
$ sudo mknod -m 666 /dev/nvidia0 c 195 0
$ sudo mknod -m 666 /dev/nvidia1 c 195 1

# Create UVM device
$ sudo mknod -m 666 /dev/nvidia-uvm c 237 0

# Or use nvidia-modprobe to do it automatically
$ sudo nvidia-modprobe -c0 -u  # Create nvidia0 and uvm

Summary

The NVIDIA device file hierarchy provides a layered interface to GPU functionality:

Core compute: /dev/nvidia* (per-GPU) and /dev/nvidiactl (global) for CUDA context management
Unified memory: /dev/nvidia-uvm for modern CUDA applications with automatic page migration
Display: /dev/nvidia-modeset and /dev/dri/* for graphics output
Advanced features: /dev/nvidia-caps/* for MIG and multi-GPU fabric management

For containerized GPU workloads, the minimum requirement is three devices: /dev/nvidia0, /dev/nvidiactl, and /dev/nvidia-uvm. The nvidia-container-runtime automatically handles device exposure, but understanding the underlying device structure is essential for debugging access issues, configuring permissions, and optimizing container deployments (especially for inference with TensorRT).

To deepen your understanding of GPU system architecture:

CUDA Context - Learn how CUDA contexts manage GPU resources and interact with device files
Unified Memory - Understand the unified memory programming model enabled by /dev/nvidia-uvm
Page Migration - Explore how data moves between CPU and GPU memory
GPU Memory Hierarchy - Discover the complete memory architecture from device memory to caches
NCCL Communication - Deep dive into multi-GPU communication patterns
NVIDIA Persistence Daemon - Learn about keeping GPU initialized for faster startups
Virtual Memory - Understand the virtual address space fundamentals

For comprehensive guides on GPU programming and troubleshooting:

Solving GPU Boot Errors - Troubleshoot NVIDIA driver loading, initramfs conflicts, and modprobe issues
How TensorRT Works - Deep dive into NVIDIA's inference optimization engine and containerized deployment
Accelerating PyTorch Models - Explore kernel optimization and GPU programming with torch.compile

Table of Contents