Reinstalling Drivers
If your GPU encounters CUDA errors, Docker issues, or is not detected properly, reinstalling the NVIDIA drivers can resolve most problems. Follow these steps to safely remove, reinstall, and verify your drivers.
Disable Hosting & Docker Services
Before making any changes, stop all services to prevent conflicts during installation.
Run the following commands:
bashCopyEditsystemctl disable nebula-hosting.service
systemctl disable docker.service
systemctl disable docker.socket
reboot
This stops all GPU-dependent processes, ensuring a clean reinstallation.
Remove Old NVIDIA Drivers
A corrupted or outdated driver installation can cause stability issues. Completely remove all existing NVIDIA drivers before reinstalling.
Run:
bashCopyEditsudo apt-get remove --purge '^nvidia-.*' -y
sudo apt autoremove -y
reboot
This removes all NVIDIA-related packages and clears system dependencies.
Install the Correct NVIDIA Driver
After rebooting, install a fresh NVIDIA driver version that matches your system.
Option 1: Install the Latest Recommended Driver
Run:
bashCopyEditsudo apt install -y nvidia-driver-535
reboot
This installs the latest stable driver from Ubuntu's official repository.
Option 2: Install a Specific NVIDIA Driver Version
If you need a specific version, use:
bashCopyEditnvidia-driver-update 535.129.03 --force
Or download a driver manually from NVIDIA’s official site and install it with:
bashCopyEditnvidia-driver-update https://us.download.nvidia.com/XFree86/Linux-x86_64/550.67/NVIDIA-Linux-x86_64-550.67.run
Ensure that the driver version matches your CUDA toolkit version to avoid compatibility issues.
4) Verify Driver Installation
After installation, confirm that the system detects the GPU properly.
Run:
bashCopyEditnvidia-smi
You should see a table with GPU details, including driver version, power usage, and processes.
💡 If nvidia-smi
returns ‘No devices found’, run:
bashCopyEditsudo modprobe nvidia
5) Re-enable Services & Restart System
Once the driver installation is complete, reactivate the hosting environment:
bashCopyEditsystemctl enable nebula-hosting.service
systemctl enable docker.service
systemctl enable docker.socket
reboot
Your GPU is now ready for hosting on Nebula AI.
6) Troubleshooting Common Issues
Issue
Cause
Fix
Black screen after installing NVIDIA drivers
Xorg (GUI) conflict
Run sudo dpkg-reconfigure gdm3
and reboot
nvidia-smi shows ‘No devices found’
Driver modules failed to load
Run sudo modprobe nvidia
Docker fails to detect GPU after driver update
NVIDIA runtime issue
Run sudo apt install --reinstall nvidia-container-runtime
CUDA errors when running workloads
Incompatible driver or CUDA mismatch
Install the correct CUDA version (apt install cuda-toolkit-12-2 -y
)
Final Steps
Reinstalling NVIDIA drivers should resolve most GPU detection and performance issues. If problems persist, check logs:
bashCopyEditjournalctl -u nebula-hosting --no-pager | tail -n 50
or visit the Nebula AI Discord for live troubleshooting.
Last updated