wiki/docs/Miscellaneous/470-datacenter-drivers.md

126 lines
4 KiB
Markdown
Raw Permalink Normal View History

2023-10-20 11:27:00 -05:00
# Nvidia 470 datacenter drivers
**Note:** Desktop drivers and datacenter drivers are different.
## Debian
This hasn't yet been tested. If you have tested it, please open a PR to update this section.
### Add apt repos
**You can skip this if you already have the repositories enabled**
To add the non-free and contrib repos, edit `/etc/apt/sources.list` and add `non-free contrib` to the end of each line, like this:
```txt
deb http://deb.debian.org/debian/ bullseye main non-free contrib
deb-src http://deb.debian.org/debian/ bullseye main non-free contrib
```
Then, run `apt update`
### Installation
To install the driver:
```sh
apt install nvidia-tesla-470-driver
```
And to install CUDA:
```bash
apt install nvidia-cuda-dev nvidia-cuda-toolkit
```
### Links
- [Driver Install Guide](https://wiki.debian.org/NvidiaGraphicsDrivers) ([Internet Archive Link](https://web.archive.org/web/20221123184836/https://wiki.debian.org/NvidiaGraphicsDrivers))
## Fedora
This guide uses the RPM Fusion repositories, and if you install CUDA, it uses Nvidia repositories as well. Note that this guide is only compatible with Fedora 35+, I'm not sure about RHEL versions.
### Add RPM Fusion repository
**You can skip this if you already have the repository installed.**
To add the RPM Fusion repository:
```bash
# Add gpg key
sudo dnf install distribution-gpg-keys
sudo rpmkeys --import /usr/share/distribution-gpg-keys/rpmfusion/RPM-GPG-KEY-rpmfusion-free-fedora-$(rpm -E %fedora)
# Add repo with gpg check
sudo dnf --setopt=localpkg_gpgcheck=1 install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm
```
### Install Driver
First, update everything, and reboot if you're not on the latest kernel.
```bash
dnf update -y
```
Then, install the driver:
```bash
dnf install akmod-nvidia-470xx
```
_**Do not reboot yet.**_
Before rebooting, use `top` or `ps` to make sure there is no `akmods`, `cc*`, `kthreadd`, or `gcc*` process running (`*` is either nothing or a number)—or anything using tons of CPU that you don't expect.
*Note:* `nvidia-smi` and other tools are not included with the driver. For that, you need to install CUDA.
### Install CUDA
Install packages needed for CUDA with:
```bash
export FEDORA_VERSION=$(rpm -E %fedora) # Nvidia's repo doesn't support Fedora 38 yet, so change this to 37 if you're on Fedora 38
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora35/x86_64/cuda-fedora${FEDORA_VERSION}.repo
dnf clean all
dnf module disable nvidia-driver
dnf -y install cuda
```
*Note:* Don't re-enable `nvidia-driver`
### Problems
#### Suspend Issues
I had issues with my K80 not working after being suspended. For example, `torch.cuda.is_available()` would give an error and return False, rather than saying True.
To fix this, install `xorg-x11-drv-nvidia-470xx-power`
```bash
dnf install xorg-x11-drv-nvidia-470xx-power
```
---
#### CUDA is higher version than driver
Sometimes the driver in the CUDA repo, and therefore dependencies for CUDA are of a later version than the driver. To fix this, run:
```bash
dnf module enable nvidia-driver -y && dnf download cuda-drivers && dnf module disable nvidia-driver -y
rpm -Uvh cuda-drivers*.rpm --nodeps
dnf update
```
### More stuff
Why not install `xorg-x11-drv-nvidia-470xx`?
- That's the _display_ driver, not the data center driver. It is the same version number, but is not the same.
### Links
- [Repo Config](https://rpmfusion.org/Configuration) ([Internet Archive Link](https://web.archive.org/web/20221111180224/https://rpmfusion.org/Configuration))
- [Verify Repo Signing Keys](https://rpmfusion.org/keys) ([Internet Archive Link](https://web.archive.org/web/20221111180744/https://rpmfusion.org/keys))
- [NVIDIA Guide](https://rpmfusion.org/Howto/NVIDIA) ([Internet Archive Link](https://web.archive.org/web/20221111181211/https://rpmfusion.org/Howto/NVIDIA))
- [CUDA Guide](https://rpmfusion.org/Howto/CUDA) ([Internet Archive Link](https://web.archive.org/web/20221111181243/https://rpmfusion.org/Howto/CUDA))