[GH-ISSUE #7723] Can´t use GPU at Ubuntu 22.04 without Docker - permission problems #30690

Closed
opened 2026-04-22 10:35:33 -05:00 by GiteaMirror · 26 comments
Owner

Originally created by @raullopezgn on GitHub (Nov 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7723

What is the issue?

Hi, I have been using Jan.ai but I wanted to try other options.

I can't run Ollama taking advantage of my GPU. I would prefer not to use Docker for security reasons.
Below I provide all the info that you maybe need to help me in order to find a solution. Thank you in advance.

CPU: AMD Ryzen 5 5600
GPU: AMD Sapphire Nitro+ RX 5700 XT
OS: ubuntu 22.04
Podman version: 3.4.4
Ollama version: 0.4.2

  1. I installed lastest AMD files using this command:
    amdgpu-install -y --opencl=rocr

  2. I installed ollama with these commands:
    curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
    sudo tar -C /usr -xzf ollama-linux-amd64.tgz

curl -L https://ollama.com/download/ollama-linux-amd64-rocm.tgz -o ollama-linux-amd64-rocm.tgz
sudo tar -C /usr -xzf ollama-linux-amd64-rocm.tgz

sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo usermod -a -G ollama $(whoami)

  1. When I execute as a root Ollama and run mistral:7b at the logs it appears these lines (below I attached the whole log as TXT):

level=ERROR source=amd_linux.go:404 msg="amdgpu devices detected but permission problems block access: permissions not set up properly. Either run ollama as root, or add you user account to the render group. open /dev/kfd: permission denied"

  1. After I got this message, then I added my user to the "render" group. But I had the same problem.
    I think it is pending to change the permission for the KFD file. However, as I'm not expert in Linux commands, I don't know how to change the permission for /dev/kfd and to which user I should give permissions.

ls -l kfd
crw-rw---- 1 root video 237, 0 nov 18 2024 kfd

  1. Also I discovered this:

rocminfo
ROCk module version 6.8.5 is loaded
Unable to open /dev/kfd read-write: Permission denied
Failed to get user name to check for video group membership

------------ ollama.service ------------
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"
Environment="HSA_OVERRIDE_GFX_VERSION="10.3.0""

[Install]
WantedBy=default.target


OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.4.2

Originally created by @raullopezgn on GitHub (Nov 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7723 ### What is the issue? Hi, I have been using Jan.ai but I wanted to try other options. I can't run Ollama taking advantage of my GPU. I would prefer not to use Docker for security reasons. Below I provide all the info that you maybe need to help me in order to find a solution. Thank you in advance. CPU: AMD Ryzen 5 5600 GPU: AMD Sapphire Nitro+ RX 5700 XT OS: ubuntu 22.04 Podman version: 3.4.4 Ollama version: 0.4.2 1) I installed lastest AMD files using this command: amdgpu-install -y --opencl=rocr 2) I installed ollama with these commands: curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz sudo tar -C /usr -xzf ollama-linux-amd64.tgz curl -L https://ollama.com/download/ollama-linux-amd64-rocm.tgz -o ollama-linux-amd64-rocm.tgz sudo tar -C /usr -xzf ollama-linux-amd64-rocm.tgz sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama sudo usermod -a -G ollama $(whoami) 3) When I execute as a root Ollama and run mistral:7b at the logs it appears these lines (below I attached the whole log as TXT): level=ERROR source=amd_linux.go:404 msg="amdgpu devices detected but permission problems block access: permissions not set up properly. Either run ollama as root, or add you user account to the render group. open /dev/kfd: permission denied" 4) After I got this message, then I added my user to the "render" group. But I had the same problem. I think it is pending to change the permission for the KFD file. However, as I'm not expert in Linux commands, I don't know how to change the permission for /dev/kfd and to which user I should give permissions. ls -l kfd crw-rw---- 1 root video 237, 0 nov 18 2024 kfd 5) Also I discovered this: rocminfo ROCk module version 6.8.5 is loaded Unable to open /dev/kfd read-write: Permission denied Failed to get user name to check for video group membership ------------ ollama.service ------------ [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 Environment="PATH=$PATH" Environment="HSA_OVERRIDE_GFX_VERSION="10.3.0"" [Install] WantedBy=default.target ------------ ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.4.2
GiteaMirror added the bugamd labels 2026-04-22 10:35:33 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

The ollama user is not in the video group and hence can't open /dev/kfd. Run the following to add ollama:

sudo usermod -a -G video ollama

Is there any reason for not doing the usual curl|sh install?

<!-- gh-comment-id:2482533784 --> @rick-github commented on GitHub (Nov 18, 2024): The ollama user is not in the `video` group and hence can't open /dev/kfd. Run the following to add ollama: ``` sudo usermod -a -G video ollama ``` Is there any reason for not doing the usual `curl|sh` install?
Author
Owner

@raullopezgn commented on GitHub (Nov 18, 2024):

The ollama user is not in the video group and hence can't open /dev/kfd. Run the following to add ollama:

sudo usermod -a -G video ollama

Is there any reason for not doing the usual curl|sh install?

Thank you. I added ollama to the video group (I checked that I did it correctly), however the logs are the same, permission denied.

I followed the steps of the "Manual installation" and I thought that I hadn´t to execute this command:
curl -fsSL https://ollama.com/install.sh | sh

<!-- gh-comment-id:2482620905 --> @raullopezgn commented on GitHub (Nov 18, 2024): > The ollama user is not in the `video` group and hence can't open /dev/kfd. Run the following to add ollama: > > ``` > sudo usermod -a -G video ollama > ``` > > Is there any reason for not doing the usual `curl|sh` install? Thank you. I added ollama to the video group (I checked that I did it correctly), however the logs are the same, permission denied. I followed the steps of the "[Manual installation](https://github.com/ollama/ollama/blob/main/docs/linux.md)" and I thought that I hadn´t to execute this command: curl -fsSL https://ollama.com/install.sh | sh
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

What's the output of

groups ollama

Did you restart the ollama service after adding ollama to the video group (sudo systemctl stop ollama ; sudo systemctl start ollama)?

<!-- gh-comment-id:2482653759 --> @rick-github commented on GitHub (Nov 18, 2024): What's the output of ``` groups ollama ``` Did you restart the ollama service after adding ollama to the video group (`sudo systemctl stop ollama ; sudo systemctl start ollama`)?
Author
Owner

@raullopezgn commented on GitHub (Nov 18, 2024):

Yes, of course:
ollama : ollama video

I didn't know that I had to restart the ollama service. I did it, however now I can't run ollama (with and without sudo). It appears the following error:
Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found

<!-- gh-comment-id:2482729405 --> @raullopezgn commented on GitHub (Nov 18, 2024): Yes, of course: ollama : ollama video I didn't know that I had to restart the ollama service. I did it, however now I can't run ollama (with and without sudo). It appears the following error: Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

ollama is not in the render group.

<!-- gh-comment-id:2482740899 --> @rick-github commented on GitHub (Nov 18, 2024): ollama is not in the `render` group.
Author
Owner

@raullopezgn commented on GitHub (Nov 18, 2024):

Now, it is:
ollama : ollama video render

I restarted the ollama service and the display started blinking and the display went to sleep mode. Then the terminal was closed.

I restarted the computer, I ran again Ollama and the symbol of loading in the terminal was loading infinite.

<!-- gh-comment-id:2482791555 --> @raullopezgn commented on GitHub (Nov 18, 2024): Now, it is: ollama : ollama video render I restarted the ollama service and the display started blinking and the display went to sleep mode. Then the terminal was closed. I restarted the computer, I ran again Ollama and the symbol of loading in the terminal was loading infinite.
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

The ollama server started, a client asked to load mistral:7b-instruct-v0.3-q4_0, a runner was started and began to load the model, then the client quit causing the model load to be cancelled.

The model load was running for 1m47 before the client quit. I don't know if that's a long time for a ROCm system, it took 36s on my Nvidia system. Other than that, this looks normal.

<!-- gh-comment-id:2482819211 --> @rick-github commented on GitHub (Nov 18, 2024): The ollama server started, a client asked to load mistral:7b-instruct-v0.3-q4_0, a runner was started and began to load the model, then the client quit causing the model load to be cancelled. The model load was running for 1m47 before the client quit. I don't know if that's a long time for a ROCm system, it took 36s on my Nvidia system. Other than that, this looks normal.
Author
Owner

@raullopezgn commented on GitHub (Nov 18, 2024):

before I posted this thread, mistral was loaded in a matter of seconds but it was loaded. Right now, it wasn't.

I gave a second opportunity and I was waiting 3m48s and mistral wasn't loaded.

<!-- gh-comment-id:2482850799 --> @raullopezgn commented on GitHub (Nov 18, 2024): before I posted this thread, mistral was loaded in a matter of seconds but it was loaded. Right now, it wasn't. I gave a second opportunity and I was waiting 3m48s and mistral wasn't loaded.
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

Add Environment="OLLAMA_DEBUG=1" to the service config and restart ollama, run the client again, then attach logs.

<!-- gh-comment-id:2482858821 --> @rick-github commented on GitHub (Nov 18, 2024): Add `Environment="OLLAMA_DEBUG=1"` to the service config and restart ollama, run the client again, then attach logs.
Author
Owner

@raullopezgn commented on GitHub (Nov 18, 2024):

I added the line that you recommended and when I used the commands to restart ollama, I got this message:
"Warning: The unit file, source configuration file or drop-ins of ollama.service changed on disk. Run ‘systemctl daemon-reload’ to reload units."

Then I executed these 2 commands:
systemctl daemon-reload
systemctl restart ollama.service

After that I run ollama and the computer crashed. Here the logs:
ollama-server-log-241118-03-PC-crashed.txt

Then, I restarted the PC again, and I run ollama as sudo and I waited some minutes. Here is the log:
ollama-server-log-241118-04.txt

Thank you for your time

<!-- gh-comment-id:2482899306 --> @raullopezgn commented on GitHub (Nov 18, 2024): I added the line that you recommended and when I used the commands to restart ollama, I got this message: "Warning: The unit file, source configuration file or drop-ins of ollama.service changed on disk. Run ‘systemctl daemon-reload’ to reload units." Then I executed these 2 commands: systemctl daemon-reload systemctl restart ollama.service After that I run ollama and the computer crashed. Here the logs: [ollama-server-log-241118-03-PC-crashed.txt](https://github.com/user-attachments/files/17800109/ollama-server-log-241118-03-PC-crashed.txt) Then, I restarted the PC again, and I run ollama as sudo and I waited some minutes. Here is the log: [ollama-server-log-241118-04.txt](https://github.com/user-attachments/files/17800119/ollama-server-log-241118-04.txt) Thank you for your time
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

The runner starts but seems to have gotten wedged at the point it was supposed to create a new context after creating the ROCm0 buffer. There have been a few issues filed recently involving AMD GPUs, perhaps roll back to an earlier version while the kinks are worked out.

<!-- gh-comment-id:2482936496 --> @rick-github commented on GitHub (Nov 18, 2024): The runner starts but seems to have gotten wedged at the point it was supposed to create a new context after creating the `ROCm0` buffer. There have been a few issues filed recently involving AMD GPUs, perhaps roll back to an earlier version while the kinks are worked out.
Author
Owner

@raullopezgn commented on GitHub (Nov 18, 2024):

Ok, thank you for the recommendation. One question, once I roll back or downgrade the current version of the AMD drivers, should I change or do something else in/with Ollama or any other thing?

<!-- gh-comment-id:2483273076 --> @raullopezgn commented on GitHub (Nov 18, 2024): Ok, thank you for the recommendation. One question, once I roll back or downgrade the current version of the AMD drivers, should I change or do something else in/with Ollama or any other thing?
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

I would recommend ollama 0.3.14, the 0.4 series has just switched build architecture and it just makes it complicated to have so many variables when configuring a system. Get an older version of ollama working with older drivers, and when that works, upgrade single components at a time.

<!-- gh-comment-id:2483293272 --> @rick-github commented on GitHub (Nov 18, 2024): I would recommend ollama 0.3.14, the 0.4 series has just switched build architecture and it just makes it complicated to have so many variables when configuring a system. Get an older version of ollama working with older drivers, and when that works, upgrade single components at a time.
Author
Owner

@raullopezgn commented on GitHub (Nov 18, 2024):

I have been "fighting" (I will explain later what I have done with the AMD drivers) and I think that I installed Ollama 0.3.14 (I followed this info) but when I execute the command to see the status, it appears the following message:

ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset>
Active: activating (auto-restart) (Result: exit-code) since Mon 2024-11-18>
Process: 7199 ExecStart=/usr/local/bin/ollama serve (code=exited, status=1/>
Main PID: 7199 (code=exited, status=1/FAILURE)
CPU: 12ms

<!-- gh-comment-id:2484078364 --> @raullopezgn commented on GitHub (Nov 18, 2024): I have been "fighting" (I will explain later what I have done with the AMD drivers) and I think that I installed Ollama 0.3.14 ([I followed this info](https://github.com/ollama/ollama/pull/4084/files/05eeb01c034216c3ad6f4401594819b26ea5b91a#diff-57cdcc6b6701a7ce3b3a8f8c0366e0e611e6fb1a1b8dd7cd29e3263b3e064c8b)) but when I execute the command to see the status, it appears the following message: **ollama.service - Ollama Service** Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset> Active: activating (auto-restart) (Result: exit-code) since Mon 2024-11-18> Process: 7199 ExecStart=/usr/local/bin/ollama serve (code=exited, status=1/> Main PID: 7199 (code=exited, status=1/FAILURE) CPU: 12ms
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

What's the output of the following commands:

systemctl cat ollama --no-pager
journalctl -u ollama --no-pager
command -v ollama
ls -l /usr/local/bin/ollama
ldd /usr/local/bin/ollama
<!-- gh-comment-id:2484089537 --> @rick-github commented on GitHub (Nov 18, 2024): What's the output of the following commands: ``` systemctl cat ollama --no-pager journalctl -u ollama --no-pager command -v ollama ls -l /usr/local/bin/ollama ldd /usr/local/bin/ollama ```
Author
Owner

@raullopezgn commented on GitHub (Nov 18, 2024):

  1. systemctl cat ollama --no-pager
    /etc/systemd/system/ollama.service
    [Unit]
    Description=Ollama Service
    After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/home/amador/anaconda3/bin:/home/amador/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"

[Install]
WantedBy=default.target

  1. command -v ollama
    /usr/local/bin/ollama

  2. ls -l /usr/local/bin/ollama
    -rwxr-xr-x 1 root root 47593432 nov 15 22:50 /usr/local/bin/ollama

  3. ldd /usr/local/bin/ollama
    linux-vdso.so.1 (0x00007ffc567fe000)
    libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007d9aaa011000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007d9aaa00c000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007d9aaa007000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007d9aaa002000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007d9aa9c00000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007d9aa9f1b000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007d9aa9ef9000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007d9aa9800000)
    /lib64/ld-linux-x86-64.so.2 (0x00007d9aaa037000)

  4. journalctl -u ollama --no-pager
    Updated: I remove the Ollama server log because it was too long and it doesn´t help to read the whole issue

<!-- gh-comment-id:2484105063 --> @raullopezgn commented on GitHub (Nov 18, 2024): 1) **systemctl cat ollama --no-pager** /etc/systemd/system/ollama.service [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 Environment="PATH=/home/amador/anaconda3/bin:/home/amador/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin" [Install] WantedBy=default.target 2) **command -v ollama** /usr/local/bin/ollama 3) **ls -l /usr/local/bin/ollama** -rwxr-xr-x 1 root root 47593432 nov 15 22:50 /usr/local/bin/ollama 4) **ldd /usr/local/bin/ollama** linux-vdso.so.1 (0x00007ffc567fe000) libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007d9aaa011000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007d9aaa00c000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007d9aaa007000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007d9aaa002000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007d9aa9c00000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007d9aa9f1b000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007d9aa9ef9000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007d9aa9800000) /lib64/ld-linux-x86-64.so.2 (0x00007d9aaa037000) 5) **journalctl -u ollama --no-pager** Updated: I remove the Ollama server log because it was too long and it doesn´t help to read the whole issue
Author
Owner

@rick-github commented on GitHub (Nov 18, 2024):

nov 18 22:49:58 torreTA ollama[11775]: Error: could not create directory mkdir /usr/share/ollama: permission denied

Run the following:

sudo mkdir -p /usr/share/ollama
sudo chown -R ollama:ollama /usr/share/ollama
<!-- gh-comment-id:2484117250 --> @rick-github commented on GitHub (Nov 18, 2024): ``` nov 18 22:49:58 torreTA ollama[11775]: Error: could not create directory mkdir /usr/share/ollama: permission denied ``` Run the following: ``` sudo mkdir -p /usr/share/ollama sudo chown -R ollama:ollama /usr/share/ollama ```
Author
Owner

@raullopezgn commented on GitHub (Nov 19, 2024):

Your proposal helped to fix the warning message and ollama could be loaded. However, I didn´t have enough space to download Mistral:7b

I tried to change the folder where the models were downloaded using a symlink or adding some lines to ollama.service with the path to the new folder but none of them worked :(

Also, I found out that I had installed Ollama 0.4.2, that means that the command that I found to install a specific version of Ollama was incorrect (link official page):
curl -fsSL https://ollama.com/install.sh | VER_PARAM=0.3.14 sh

Now, I have to "uninstall" the current version of Ollama and install the 0.3.14. Would it be the following steps correct? *

  • is it necessary to do the point 3 so I can use my AMD GPU with Ollama?
  1. Download in my computer the Ollama 0.3.14:
    https://github.com/ollama/ollama/releases/download/v0.3.14/ollama-linux-amd64.tgz

  2. Execute the following command where I downloaded the file of point 1:
    sudo tar -C /usr -xzf ollama-linux-amd64.tgz

  3. Download in my computer the Ollama 0.3.14 package related to Rocm:
    https://github.com/ollama/ollama/releases/download/v0.3.14/ollama-linux-amd64-rocm.tgz

  4. Execute the following command where I downloaded the file of point 3:
    sudo tar -C /usr -xzf ollama-linux-amd64-rocm.tgz

And after that I will follow the rest of steps to be ready to executa Ollama and download a model.

<!-- gh-comment-id:2485080114 --> @raullopezgn commented on GitHub (Nov 19, 2024): Your proposal helped to fix the warning message and ollama could be loaded. However, I didn´t have enough space to download Mistral:7b I tried to change the folder where the models were downloaded using a symlink or adding some lines to ollama.service with the path to the new folder but none of them worked :( Also, I found out that I had installed Ollama 0.4.2, that means that the command that I found to install a specific version of Ollama was incorrect ([link official page](https://github.com/ollama/ollama/pull/4084/files/05eeb01c034216c3ad6f4401594819b26ea5b91a#diff-57cdcc6b6701a7ce3b3a8f8c0366e0e611e6fb1a1b8dd7cd29e3263b3e064c8b)): **curl -fsSL https://ollama.com/install.sh | VER_PARAM=0.3.14 sh** Now, I have to "uninstall" the current version of Ollama and install the 0.3.14. Would it be the following steps correct? * * is it necessary to do the point 3 so I can use my AMD GPU with Ollama? 1) Download in my computer the Ollama 0.3.14: https://github.com/ollama/ollama/releases/download/v0.3.14/ollama-linux-amd64.tgz 2) Execute the following command where I downloaded the file of point 1: sudo tar -C /usr -xzf ollama-linux-amd64.tgz 3) Download in my computer the Ollama 0.3.14 package related to Rocm: https://github.com/ollama/ollama/releases/download/v0.3.14/ollama-linux-amd64-rocm.tgz 4) Execute the following command where I downloaded the file of point 3: sudo tar -C /usr -xzf ollama-linux-amd64-rocm.tgz And after that I will follow the rest of steps to be ready to executa Ollama and download a model.
Author
Owner

@rick-github commented on GitHub (Nov 19, 2024):

That's not the correct way to download a different version. The instructions are here:

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.3.14 sh

Just run that, ollama should install and be ready to pull a model.

<!-- gh-comment-id:2485100037 --> @rick-github commented on GitHub (Nov 19, 2024): That's not the correct way to download a different version. The instructions are [here](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#installing-older-or-pre-release-versions-on-linux): ``` curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.3.14 sh ``` Just run that, ollama should install and be ready to pull a model.
Author
Owner

@raullopezgn commented on GitHub (Nov 19, 2024):

I have checked that I installed Ollama 0.3.14 (thanks for the command, it worked!).

I run mistral using 100% CPU and all ok.

After that, I updated ollama.service with the following info (I know that my 5700XT doesn´t have this version):
Environment="HSA_OVERRIDE_GFX_VERSION="10.3.0""

After that, I executed these commands:
systemctl daemon-reload
systemctl restart ollama.service

And just in case, I also executed these ones:
sudo systemctl stop ollama
sudo systemctl start ollama

I run ollama but the PC crashed.

<!-- gh-comment-id:2486332667 --> @raullopezgn commented on GitHub (Nov 19, 2024): I have checked that I installed Ollama 0.3.14 (thanks for the command, it worked!). I run mistral using 100% CPU and all ok. After that, I updated ollama.service with the following info (I know that my 5700XT doesn´t have this version): Environment="HSA_OVERRIDE_GFX_VERSION="10.3.0"" After that, I executed these commands: systemctl daemon-reload systemctl restart ollama.service And just in case, I also executed these ones: sudo systemctl stop ollama sudo systemctl start ollama I run ollama but the PC crashed.
Author
Owner

@rick-github commented on GitHub (Nov 19, 2024):

What happens when it crashes? No longer responds to commands, reboots, something else? The reason for the crash won't be in the ollama log, you'd need to check /var/log/syslog or /var/log/kern.log to find out what happened to the system.

<!-- gh-comment-id:2486349104 --> @rick-github commented on GitHub (Nov 19, 2024): What happens when it crashes? No longer responds to commands, reboots, something else? The reason for the crash won't be in the ollama log, you'd need to check /var/log/syslog or /var/log/kern.log to find out what happened to the system.
Author
Owner

@rick-github commented on GitHub (Nov 19, 2024):

Nov 19 19:59:44 torreTA ollama[1606]: llm_load_tensors:        CPU buffer size =    72.00 MiB
Nov 19 20:00:11 torreTA ollama[1606]: HW Exception by GPU node-1 (Agent handle: 0x2227730) reason :GPU Hang
Nov 19 20:00:11 torreTA kernel: [ 2448.344887] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=857, emitted seq=859
Nov 19 20:00:11 torreTA kernel: [ 2448.345217] amdgpu 0000:08:00.0: amdgpu: GPU reset begin!
Nov 19 20:00:11 torreTA kernel: [ 2448.461681] amdgpu 0000:08:00.0: amdgpu: Dumping IP State
Nov 19 20:00:11 torreTA kernel: [ 2448.463844] BUG: unable to handle page fault for address: ffff90cf1c6b1060
Nov 19 20:00:11 torreTA kernel: [ 2448.463848] #PF: supervisor write access in kernel mode
Nov 19 20:00:11 torreTA kernel: [ 2448.463850] #PF: error_code(0x0003) - permissions violation
Nov 19 20:00:11 torreTA kernel: [ 2448.463852] PGD 4c5e01067 P4D 4c5e01067 PUD 10150d063 PMD 11c6c0063 PTE 800000011c6b1121
Nov 19 20:00:11 torreTA kernel: [ 2448.463858] Oops: 0003 [#1] PREEMPT SMP NOPTI
Nov 19 20:00:11 torreTA kernel: [ 2448.463861] CPU: 9 PID: 3979 Comm: kworker/u64:1 Tainted: G           OE      6.8.0-48-generic #48~22.04.1-Ubuntu
Nov 19 20:00:11 torreTA kernel: [ 2448.463865] Hardware name: ASUS System Product Name/PRIME B550-PLUS, BIOS 3002 02/23/2023
Nov 19 20:00:11 torreTA kernel: [ 2448.463867] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [amd_sched]
...
Nov 19 20:00:11 torreTA ollama[1606]: time=2024-11-19T20:00:11.359+02:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error"
Nov 19 20:00:11 torreTA ollama[1606]: time=2024-11-19T20:00:11.609+02:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"
Nov 19 20:00:11 torreTA ollama[1606]: [GIN] 2024/11/19 - 20:00:11 | 500 |  36.51008711s |       127.0.0.1 | POST     "/api/generate"

Looks like a crash in the kernel driver. Searching for the errors doesn't turn up much in the way of solutions. A few forums suggested newer kernels might be more resistant. Ubuntu 22 is just over 2.5 years old, if you haven't been doing updates, try that. Alternatively, try different overrides. I'm not familiar with AMD GPUs so I have no recommendations other those documented here. There are other AMD issues open in the tracker, one specifically for the Radeon RX 5700 XT, so having a look at some of those might give you some directions to pursue.

<!-- gh-comment-id:2486813238 --> @rick-github commented on GitHub (Nov 19, 2024): ``` Nov 19 19:59:44 torreTA ollama[1606]: llm_load_tensors: CPU buffer size = 72.00 MiB Nov 19 20:00:11 torreTA ollama[1606]: HW Exception by GPU node-1 (Agent handle: 0x2227730) reason :GPU Hang Nov 19 20:00:11 torreTA kernel: [ 2448.344887] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=857, emitted seq=859 Nov 19 20:00:11 torreTA kernel: [ 2448.345217] amdgpu 0000:08:00.0: amdgpu: GPU reset begin! Nov 19 20:00:11 torreTA kernel: [ 2448.461681] amdgpu 0000:08:00.0: amdgpu: Dumping IP State Nov 19 20:00:11 torreTA kernel: [ 2448.463844] BUG: unable to handle page fault for address: ffff90cf1c6b1060 Nov 19 20:00:11 torreTA kernel: [ 2448.463848] #PF: supervisor write access in kernel mode Nov 19 20:00:11 torreTA kernel: [ 2448.463850] #PF: error_code(0x0003) - permissions violation Nov 19 20:00:11 torreTA kernel: [ 2448.463852] PGD 4c5e01067 P4D 4c5e01067 PUD 10150d063 PMD 11c6c0063 PTE 800000011c6b1121 Nov 19 20:00:11 torreTA kernel: [ 2448.463858] Oops: 0003 [#1] PREEMPT SMP NOPTI Nov 19 20:00:11 torreTA kernel: [ 2448.463861] CPU: 9 PID: 3979 Comm: kworker/u64:1 Tainted: G OE 6.8.0-48-generic #48~22.04.1-Ubuntu Nov 19 20:00:11 torreTA kernel: [ 2448.463865] Hardware name: ASUS System Product Name/PRIME B550-PLUS, BIOS 3002 02/23/2023 Nov 19 20:00:11 torreTA kernel: [ 2448.463867] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [amd_sched] ... Nov 19 20:00:11 torreTA ollama[1606]: time=2024-11-19T20:00:11.359+02:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error" Nov 19 20:00:11 torreTA ollama[1606]: time=2024-11-19T20:00:11.609+02:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)" Nov 19 20:00:11 torreTA ollama[1606]: [GIN] 2024/11/19 - 20:00:11 | 500 | 36.51008711s | 127.0.0.1 | POST "/api/generate" ``` Looks like a crash in the kernel driver. Searching for the errors doesn't turn up much in the way of solutions. A few forums suggested newer kernels might be more resistant. Ubuntu 22 is just over 2.5 years old, if you haven't been doing updates, try that. Alternatively, try different overrides. I'm not familiar with AMD GPUs so I have no recommendations other those documented [here](https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides-on-linux). There are other AMD issues open in the tracker, one specifically for the [Radeon RX 5700 XT](https://github.com/ollama/ollama/issues/2503), so having a look at some of those might give you some directions to pursue.
Author
Owner

@raullopezgn commented on GitHub (Nov 19, 2024):

Well, Ubuntu 22 is a LTS version, and I have specific the 22.04.5 with kernel 6.8.

I changed in the ollama.service the following set up Environment="HSA_OVERRIDE_GFX_VERSION and I tried 10.3.0, 10.3.1 and 10.3.2 without success.

I don´t know what else we could do, I give up. Thank you very much for your support :)

<!-- gh-comment-id:2486862709 --> @raullopezgn commented on GitHub (Nov 19, 2024): Well, Ubuntu 22 is a LTS version, and I have specific the 22.04.5 with kernel 6.8. I changed in the ollama.service the following set up Environment="HSA_OVERRIDE_GFX_VERSION and I tried 10.3.0, 10.3.1 and 10.3.2 without success. I don´t know what else we could do, I give up. Thank you very much for your support :)
Author
Owner

@raullopezgn commented on GitHub (Nov 20, 2024):

One question, when I´m not using Ollama, what have I to do to free the resources of my computer?

<!-- gh-comment-id:2487877418 --> @raullopezgn commented on GitHub (Nov 20, 2024): One question, when I´m not using Ollama, what have I to do to free the resources of my computer?
Author
Owner

@raullopezgn commented on GitHub (Nov 20, 2024):

@rick-github I found a solution and it works with the RX 5700XT GPU !!

Yesterday, I read the issue 2503 in this repo and tried to use a symlink as they mention, however I forget to update in ollama.service the GFX VERSION (I had 10.3.0). So, I will explain what I have done, following the solution on the issue 2503 proposed by @tecnomanu

  1. I replaced the GFX VERSION in ollama.service from 10.3.0 for 10.1.0 and I run ollama on purpose to read the error messages to have the right info. And I got the following:

Error: llama runner process has terminated: error:Cannot read /usr/local/lib/ollama/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010

  1. Then, I had the right path and I created a symlink. This is the command:

sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat

  1. After that I executed:
    ollama ps
    NAME ID SIZE PROCESSOR UNTIL
    mistral:latest f164t71690m9 6.3 GB 100% GPU 2 minutes from now

  2. And this is the log of my Ollama server: ollama-server-log-241120-07.txt

Now, who would I have to contact from Ollama´s project to take this into account and find a solution for future releases?

<!-- gh-comment-id:2487881346 --> @raullopezgn commented on GitHub (Nov 20, 2024): @rick-github I found a solution and it works with the RX 5700XT GPU !! Yesterday, I read the [issue 2503](https://github.com/ollama/ollama/issues/2503) in this repo and tried to use a symlink as they mention, however I forget to update in ollama.service the GFX VERSION (I had 10.3.0). So, I will explain what I have done, following the solution on the issue 2503 proposed by @tecnomanu 1) I replaced the GFX VERSION in ollama.service from 10.3.0 for 10.1.0 and I run ollama on purpose to read the error messages to have the right info. And I got the following: **Error: llama runner process has terminated: error:Cannot read /usr/local/lib/ollama/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1010** 2) Then, I had the right path and I created a symlink. This is the command: **sudo ln -s /usr/local/lib/ollama/rocblas/library/TensileLibrary_lazy_gfx{1030,1010}.dat** 3) After that I executed: ollama ps NAME ID SIZE PROCESSOR UNTIL **mistral:latest f164t71690m9 6.3 GB 100% GPU 2 minutes from now** 4) And this is the log of my Ollama server: [ollama-server-log-241120-07.txt](https://github.com/user-attachments/files/17827330/ollama-server-log-241120-07.txt) Now, who would I have to contact from Ollama´s project to take this into account and find a solution for future releases?
Author
Owner

@rick-github commented on GitHub (Nov 20, 2024):

To free resources:

sudo systemctl stop ollama

Regarding the solution: AMD support is still evolving (see the recent spate of tickets), #2503 is still open, and there are ollama developers on the thread, so it will be addressed eventually.

<!-- gh-comment-id:2488105538 --> @rick-github commented on GitHub (Nov 20, 2024): To free resources: ``` sudo systemctl stop ollama ``` Regarding the solution: AMD support is still evolving (see the recent spate of tickets), #2503 is still open, and there are ollama developers on the thread, so it will be addressed eventually.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30690