Reinstalling Drivers
If your GPU encounters CUDA errors, Docker issues, or is not detected properly, reinstalling the NVIDIA drivers can resolve most problems. Follow these steps to safely remove, reinstall, and verify your drivers.
Disable Hosting & Docker Services
Before making any changes, stop all services to prevent conflicts during installation.
Run the following commands:
This stops all GPU-dependent processes, ensuring a clean reinstallation.
Remove Old NVIDIA Drivers
A corrupted or outdated driver installation can cause stability issues. Completely remove all existing NVIDIA drivers before reinstalling.
Run:
This removes all NVIDIA-related packages and clears system dependencies.
Install the Correct NVIDIA Driver
After rebooting, install a fresh NVIDIA driver version that matches your system.
Option 1: Install the Latest Recommended Driver
Run:
This installs the latest stable driver from Ubuntu's official repository.
Option 2: Install a Specific NVIDIA Driver Version
If you need a specific version, use:
Or download a driver manually from NVIDIA’s official site and install it with:
Ensure that the driver version matches your CUDA toolkit version to avoid compatibility issues.
4) Verify Driver Installation
After installation, confirm that the system detects the GPU properly.
Run:
You should see a table with GPU details, including driver version, power usage, and processes.
💡 If nvidia-smi
returns ‘No devices found’, run:
5) Re-enable Services & Restart System
Once the driver installation is complete, reactivate the hosting environment:
Your GPU is now ready for hosting on Nebula AI.
6) Troubleshooting Common Issues
Issue
Cause
Fix
Black screen after installing NVIDIA drivers
Xorg (GUI) conflict
Run sudo dpkg-reconfigure gdm3
and reboot
nvidia-smi shows ‘No devices found’
Driver modules failed to load
Run sudo modprobe nvidia
Docker fails to detect GPU after driver update
NVIDIA runtime issue
Run sudo apt install --reinstall nvidia-container-runtime
CUDA errors when running workloads
Incompatible driver or CUDA mismatch
Install the correct CUDA version (apt install cuda-toolkit-12-2 -y
)
Final Steps
Reinstalling NVIDIA drivers should resolve most GPU detection and performance issues. If problems persist, check logs:
Last updated