NVIDIA Device Files in /dev/

20 min

Understanding character devices, major/minor numbers, and the device file hierarchy created by NVIDIA drivers for GPU access in Linux.

Best viewed on desktop for optimal interactive experience

Overview

When the NVIDIA driver loads on a Linux system, it creates multiple character device files in /dev/ that serve as the interface between userspace applications and the GPU hardware. These device files represent different aspects of GPU functionality—from basic compute access to unified memory management to display control.

Understanding this device file structure is essential for containerization, permission management, and debugging GPU access issues. Whether you're configuring Docker containers, troubleshooting CUDA initialization errors (see GPU boot errors), or managing multi-GPU systems, knowing which device files do what is crucial.

Character Devices Explained

The c at the start of permissions (e.g., crw-rw-rw-) indicates a character device—a special file that provides unbuffered, direct access to hardware. Unlike block devices (used for disks), character devices handle data as a stream of characters.

The two numbers after the owner/group (e.g., 195, 0) are the major and minor device numbers:

  • Major number (195): Identifies the driver handling this device (NVIDIA driver)
  • Minor number (0, 1, 255): Identifies which specific device within that driver
$ ls -la /dev/nvidia* crw-rw-rw- 1 root root 195, 0 Nov 2 10:00 /dev/nvidia0 crw-rw-rw- 1 root root 195, 1 Nov 2 10:00 /dev/nvidia1 crw-rw-rw- 1 root root 195, 255 Nov 2 10:00 /dev/nvidiactl crw-rw-rw- 1 root root 195, 254 Nov 2 10:00 /dev/nvidia-modeset

Core GPU Devices

The following visualization shows the core GPU device files that provide direct access to individual GPUs and driver-wide operations:

Core GPU DevicesCore GPU Devices/dev/nvidia* - Individual GPU accesscrw-rw-rw-195, 0/dev/nvidia0GPU 0crw-rw-rw-195, 1/dev/nvidia1GPU 1crw-rw-rw-195, 255/dev/nvidiactlREQUIREDcrw-rw-rw-195, 254/dev/nvidia-modesetDISPLAYMajor 195: nvidia.ko driverMinor: Device instance numberUsed for: CUDA compute, graphics, command submission

/dev/nvidia0, /dev/nvidia1, ...

Purpose: Primary device files for individual GPUs. Each GPU in your system gets its own numbered device file.

Major/Minor: 195, N (where N = GPU index)

Module: nvidia.ko

Who uses it:

  • CUDA runtime (libcuda.so) for kernel launches (see PyTorch kernel compilation)
  • OpenGL/Vulkan drivers for graphics operations
  • nvidia-smi for querying GPU information
  • Any application accessing that specific GPU (e.g., TensorRT optimization)

What operations:

// Typical CUDA application flow int fd = open("/dev/nvidia0", O_RDWR); if (fd < 0) { perror("Failed to open GPU 0"); return -1; } // Device is now accessible for CUDA operations cudaSetDevice(0); // Uses /dev/nvidia0 internally cudaMalloc(&ptr, size);

Key operations:

  • Memory allocation: ioctl(fd, NVIDIA_ALLOC_MEMORY, ...)
  • Command submission: ioctl(fd, NVIDIA_SUBMIT_CMD, ...)
  • Query status: ioctl(fd, NVIDIA_QUERY_STATUS, ...)

Numbering: GPU device numbers correspond to PCI bus enumeration order, which may not match physical slot positions:

# Query GPU to device file mapping $ nvidia-smi --query-gpu=index,pci.bus_id,name --format=csv index, pci.bus_id, name 0, 00000000:01:00.0, NVIDIA RTX 4090 1, 00000000:02:00.0, NVIDIA RTX 4090 2, 00000000:41:00.0, Tesla T4 3, 00000000:42:00.0, Tesla T4 # GPU 0 → /dev/nvidia0 → PCIe 01:00.0 # GPU 1 → /dev/nvidia1 → PCIe 02:00.0

/dev/nvidiactl

Purpose: Control device for driver-wide operations that don't target a specific GPU. This is the first device opened by NVIDIA libraries to query system capabilities and enumerate available GPUs.

Major/Minor: 195, 255

Module: nvidia.ko

Required: Yes, for all GPU operations

Operations:

  • GPU enumeration: Query how many GPUs are present
  • Driver information: Get driver version, capabilities
  • Context allocation: Create CUDA contexts before binding to specific GPU
  • Global driver state: Query system-wide GPU information
// Typical initialization sequence int ctl_fd = open("/dev/nvidiactl", O_RDWR); // Query: How many GPUs are available? int num_gpus; ioctl(ctl_fd, NVIDIA_GET_NUM_GPUS, &num_gpus); // Get driver version struct nvidia_version ver; ioctl(ctl_fd, NVIDIA_GET_VERSION, &ver); // Then open specific GPU devices for (int i = 0; i < num_gpus; i++) { char path[32]; sprintf(path, "/dev/nvidia%d", i); open(path, O_RDWR); }

Why it's needed: Without /dev/nvidiactl, applications can't discover GPUs or initialize the driver. Even single-GPU CUDA programs need this device.

/dev/nvidia-modeset

Purpose: Kernel Mode Setting (KMS) interface for display management. Handles display resolution changes, monitor hotplug detection, and mode setting operations.

Major/Minor: 195, 254

Module: nvidia-modeset.ko

When needed:

  • Desktop systems with displays connected to NVIDIA GPUs
  • X11 or Wayland display servers
  • Dynamically changing display resolutions
  • Multi-monitor configurations

When NOT needed:

  • Headless GPU compute servers (no displays)
  • Pure CUDA/compute workloads
  • Container environments without display
# Check if modeset is being used $ lsof /dev/nvidia-modeset COMMAND PID USER FD TYPE DEVICE Xorg 1234 root 10u CHR 195,254 ← X server using it gnome-sh 5678 user 8u CHR 195,254 ← GNOME Shell # For headless servers, this device often sits unused

Unified Memory Devices

The Unified Virtual Memory (UVM) subsystem provides automatic page migration between CPU and GPU memory:

Unified Memory DevicesUnified Memory Devices/dev/nvidia-uvm* - Unified Virtual Memorycrw-rw-rw-237, 0/dev/nvidia-uvmREQUIREDcrw-rw-rw-237, 1/dev/nvidia-uvm-toolsOPTIONALMajor 237: nvidia-uvm.ko driverUVM = Unified Virtual MemoryRequired for:• cudaMallocManaged()• cudaMemAdvise(), cudaMemPrefetch()• PyTorch, TensorFlow unified memory

/dev/nvidia-uvm

Purpose: Unified Virtual Memory (UVM) driver interface. Enables CUDA's unified memory feature where CPU and GPU share the same virtual address space with automatic page migration.

Major/Minor: 237, 0

Module: nvidia-uvm.ko

Required: Yes, for modern CUDA applications

Critical for:

  • cudaMallocManaged() - Unified memory allocations
  • cudaMemAdvise() - Memory placement hints
  • cudaMemPrefetch() - Explicit prefetching
  • PyTorch, TensorFlow (use unified memory internally)
  • Modern CUDA applications expecting automatic memory management
// Using unified memory float *data; // This requires /dev/nvidia-uvm cudaMallocManaged(&data, size); // CPU can access directly data[0] = 1.0f; // GPU can access same pointer kernel<<<blocks, threads>>>(data); // Automatic page migration handled by UVM driver // Learn more: /concepts/gpu/page-migration

Container Requirement: CUDA containers must have access to /dev/nvidia-uvm. Without it, CUDA initialization fails with:

CUDA error: CUDA_ERROR_NOT_INITIALIZED cudaGetDeviceCount() failed: initialization error

/dev/nvidia-uvm-tools

Purpose: Debugging and profiling interface for UVM. Provides detailed statistics about page migrations, fault rates, and memory access patterns.

Major/Minor: 237, 1

Module: nvidia-uvm.ko

Required: No (optional for profiling)

Used by:

  • NVIDIA Nsight profilers
  • CUDA profiling tools (nvprof, nvidia-smi)
  • Performance analysis tools querying UVM statistics
# Query UVM statistics $ nvidia-smi --query-gpu=memory.used,memory.free --format=csv $ cat /proc/driver/nvidia/gpus/*/information # Uses uvm-tools

DRI Devices (Direct Rendering Infrastructure)

The Direct Rendering Manager (DRM) provides standardized access to graphics hardware for both display output and headless compute:

DRI Devices (Display)DRI Devices (Display)/dev/dri/* - Direct Rendering Infrastructurecrw-rw----root:video226, 0/dev/dri/card0crw-rw----root:video226, 1/dev/dri/card1crw-rw----root:render226, 128/dev/dri/renderD128crw-rw----root:render226, 129/dev/dri/renderD129Major 226: DRM (Direct Rendering Manager)card*: Display output (video group)renderD*: Headless compute (render group)Used by: Wayland, X11, headless GPU workloads

/dev/dri/card0, /dev/dri/card1

Purpose: Standard Linux DRM (Direct Rendering Manager) device nodes for graphics output. These integrate NVIDIA GPUs with the standard Linux graphics stack.

Major/Minor: 226, N (where N = GPU index)

Module: nvidia-drm.ko

Permissions: Owned by group video, so users must be in the video group to access displays.

Used by:

  • Wayland compositors (requires DRM)
  • Modern display servers
  • KMS-based framebuffer

/dev/dri/renderD128, /dev/dri/renderD129

Purpose: Render nodes for headless GPU compute. These provide GPU access without requiring display permissions, enabling non-privileged compute workloads.

Major/Minor: 226, 128+ (starting at 128)

Module: nvidia-drm.ko

Advantages:

  • No need for X11 authentication
  • Works in headless environments
  • Group permissions via render group instead of video
  • Security: compute without display access
# Render nodes are numbered starting at 128 # GPU 0 → renderD128 # GPU 1 → renderD129 # Add user to render group for compute access $ sudo usermod -aG render $USER # Now can run CUDA without display permissions

Capability Devices

Advanced GPU features like Multi-Instance GPU (MIG) and NVLink fabric management require specialized capability devices:

Capability DevicesCapability Devices/dev/nvidia-caps/* - Advanced featurescr--------243, 1/dev/nvidia-caps/nvidia-cap1MIGcr--------243, 2nvidia-cap2FABRICMajor 243: nvidia-capsRoot-only permissions (cr--------)MIG: Multi-Instance GPU (A100, H100)Fabric: NVLink/NVSwitch managementOptional for standard workloads

/dev/nvidia-caps/nvidia-cap1

Purpose: Multi-Instance GPU (MIG) capability device. MIG allows partitioning A100/H100 GPUs into smaller, isolated instances.

Major/Minor: 243, 1

Module: nvidia.ko

Permissions: Note the restrictive cr-------- permissions. Only root can access by default. Applications needing MIG access require specific capability grants.

# MIG is only available on A100, A30, H100, H200 GPUs $ nvidia-smi mig -lgi +----+--------+-------+---------+ | ID | MIG | Size | Devices | +----+--------+-------+---------+ | 0 | 1g.5gb| 5GB | nvidia0 | | 1 | 2g.10gb| 10GB | nvidia1 | +----+--------+-------+---------+ # Each MIG instance gets access to capability devices

/dev/nvidia-caps/nvidia-cap2

Purpose: NVIDIA Fabric Manager capability. Manages NVLink and NVSwitch fabric for multi-GPU communication in DGX systems.

Major/Minor: 243, 2

Module: nvidia.ko

When used: High-end multi-GPU systems with NVLink interconnect (DGX A100, DGX H100). Not needed for PCIe-only systems.

Container Integration

For containers to access GPUs, these device files must be exposed inside the container. Here's what different workloads need:

Workload TypeRequired DevicesOptional Devices
Basic CUDA Compute/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm/dev/nvidia-uvm-tools
Multi-GPU CUDA/dev/nvidia* (all GPUs) /dev/nvidiactl /dev/nvidia-uvm/dev/nvidia-uvm-tools
Graphics (OpenGL)/dev/nvidia0 /dev/nvidiactl /dev/dri/card0 /dev/dri/renderD128/dev/nvidia-modeset
Display ServerAll of the above + /dev/nvidia-modesetNone
MIG Workload/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm /dev/nvidia-caps/*None
# Docker: Expose specific GPU devices $ docker run --rm \ --device=/dev/nvidia0 \ --device=/dev/nvidiactl \ --device=/dev/nvidia-uvm \ nvidia/cuda:12.6.0-base nvidia-smi # Better: Use nvidia-container-runtime (handles all devices) $ docker run --rm --gpus all nvidia/cuda:12.6.0-base nvidia-smi # Kubernetes: nvidia-device-plugin automatically mounts required devices $ kubectl run gpu-pod --image=nvidia/cuda:12.6.0-base --limits=nvidia.com/gpu=1

Permission Management

Device file permissions control who can access GPUs:

# Default: world-readable/writable (not secure!) $ ls -l /dev/nvidia0 crw-rw-rw- 1 root root 195, 0 /dev/nvidia0 # Better: Restrict to video group $ sudo chmod 660 /dev/nvidia0 $ sudo chown root:video /dev/nvidia0 crw-rw---- 1 root video 195, 0 /dev/nvidia0 # Add user to video group $ sudo usermod -aG video $USER # Make permanent via udev rule $ sudo tee /etc/udev/rules.d/70-nvidia.rules <<EOF KERNEL=="nvidia*", MODE="0660", GROUP="video" KERNEL=="nvidia_uvm*", MODE="0660", GROUP="video" EOF $ sudo udevadm control --reload-rules $ sudo udevadm trigger

Troubleshooting

Missing Device Files

Symptom: /dev/nvidia* files don't exist

$ ls /dev/nvidia* ls: cannot access '/dev/nvidia*': No such file or directory

Causes and solutions:

  1. Driver not loaded: sudo modprobe nvidia (see kernel modules and GPU boot errors)
  2. No GPU detected: Check lspci | grep -i nvidia
  3. udev not running: sudo systemctl start udev
  4. Manual creation needed: sudo nvidia-modprobe

Permission Denied

# Error when opening device $ python -c "import torch; print(torch.cuda.is_available())" RuntimeError: CUDA error: no kernel image is available for execution on the device $ strace -e openat python -c "import torch" 2>&1 | grep nvidia openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = -1 EACCES (Permission denied) # Solution: Add to video group $ sudo usermod -aG video $USER $ newgrp video # Apply immediately

Manual Device Creation

Sometimes device files don't get created automatically (e.g., in minimal containers). You can create them manually:

# Create nvidiactl $ sudo mknod -m 666 /dev/nvidiactl c 195 255 # Create per-GPU devices (for 2 GPUs) $ sudo mknod -m 666 /dev/nvidia0 c 195 0 $ sudo mknod -m 666 /dev/nvidia1 c 195 1 # Create UVM device $ sudo mknod -m 666 /dev/nvidia-uvm c 237 0 # Or use nvidia-modprobe to do it automatically $ sudo nvidia-modprobe -c0 -u # Create nvidia0 and uvm

Summary

The NVIDIA device file hierarchy provides a layered interface to GPU functionality:

For containerized GPU workloads, the minimum requirement is three devices: /dev/nvidia0, /dev/nvidiactl, and /dev/nvidia-uvm. The nvidia-container-runtime automatically handles device exposure, but understanding the underlying device structure is essential for debugging access issues, configuring permissions, and optimizing container deployments (especially for inference with TensorRT).

To deepen your understanding of GPU system architecture:

For comprehensive guides on GPU programming and troubleshooting:

If you found this explanation helpful, consider sharing it with others.

Mastodon