[GH-ISSUE #7320] 0.4.0 regression #51162

Closed
opened 2026-04-28 18:45:01 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @skobkin on GitHub (Oct 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7320

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Just updated ollama to 0.4.0-rc3-rocm to test new LLaMA 3.2 Vision capabilities.

But it isn't working and returning 500 to OpenWebUI. It isn't working even with LLaMA 3.1 Lexi which I was using before update.

image

Here's ollama container logs when trying to chat with LLaMA 3.1 Lexi 8B Q6:

ollama  | time=2024-10-22T16:33:28.879Z level=INFO source=images.go:754 msg="total blobs: 84"
ollama  | time=2024-10-22T16:33:28.880Z level=INFO source=images.go:761 msg="total unused blobs removed: 0"
ollama  | time=2024-10-22T16:33:28.880Z level=INFO source=routes.go:1217 msg="Listening on [::]:11434 (version 0.4.0-rc3)"
ollama  | time=2024-10-22T16:33:28.880Z level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm]"
ollama  | time=2024-10-22T16:33:28.880Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
ollama  | time=2024-10-22T16:33:28.882Z level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
ollama  | time=2024-10-22T16:33:28.884Z level=INFO source=amd_linux.go:383 msg="amdgpu is supported" gpu=0 gpu_type=gfx1101
ollama  | time=2024-10-22T16:33:28.884Z level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=1 total="512.0 MiB"
ollama  | time=2024-10-22T16:33:28.884Z level=INFO source=types.go:123 msg="inference compute" id=0 library=rocm variant="" compute=gfx1101 driver=0.0 name=1002:747e total="16.0 GiB" available="15.2 GiB"
ollama  | [GIN] 2024/10/22 - 16:33:39 | 200 |     1.70353ms |      172.24.0.3 | GET      "/api/tags"
ollama  | [GIN] 2024/10/22 - 16:33:43 | 200 |      21.159µs |      172.24.0.1 | GET      "/"
ollama  | time=2024-10-22T16:33:52.469Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-545ba086eb5179e8f97f9eb7d54c61555a7cd645c5b26f9551209022878abb2c gpu=0 parallel=4 available=16308183040 required="9.7 GiB"
ollama  | time=2024-10-22T16:33:52.469Z level=INFO source=llama-server.go:72 msg="system memory" total="30.5 GiB" free="24.8 GiB" free_swap="7.8 GiB"
ollama  | time=2024-10-22T16:33:52.469Z level=INFO source=memory.go:346 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[15.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="9.7 GiB" memory.required.partial="9.7 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[9.7 GiB]" memory.weights.total="7.9 GiB" memory.weights.repeating="7.4 GiB" memory.weights.nonrepeating="532.3 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB"
ollama  | time=2024-10-22T16:33:52.470Z level=INFO source=llama-server.go:355 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm/ollama_llama_server --model /root/.ollama/models/blobs/sha256-545ba086eb5179e8f97f9eb7d54c61555a7cd645c5b26f9551209022878abb2c --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 33 --threads 12 --parallel 4 --port 34353"
ollama  | time=2024-10-22T16:33:52.470Z level=INFO source=sched.go:450 msg="loaded runners" count=1
ollama  | time=2024-10-22T16:33:52.470Z level=INFO source=llama-server.go:534 msg="waiting for llama runner to start responding"
ollama  | time=2024-10-22T16:33:52.470Z level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server error"
ollama  | /usr/lib/ollama/runners/rocm/ollama_llama_server: error while loading shared libraries: libelf.so.1: cannot open shared object file: No such file or directory
ollama  | time=2024-10-22T16:33:52.720Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"

Just in case: I have one integrated GPU and one additional GPU in PCI-e slot.

I called this regression because it didn't work even on previously perfectly functioning model, not only on LLaMA 3.2.

It looks like it's similar to #7279, but I'm not sure, as the output isn't completely identical.

OS

Linux, Docker

GPU

AMD

CPU

AMD

Ollama version

0.4.0-rc3

Originally created by @skobkin on GitHub (Oct 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7320 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Just updated ollama to [`0.4.0-rc3-rocm`](https://hub.docker.com/layers/ollama/ollama/0.4.0-rc3/images/sha256-6b75f17d6160b28dec8d8d519ceec02dfdae20e1c2451db34f3a3351f5de373a?context=explore) to test new LLaMA 3.2 Vision capabilities. But it isn't working and returning 500 to OpenWebUI. It isn't working even with LLaMA 3.1 Lexi which I was using before update. ![image](https://github.com/user-attachments/assets/9c4a002e-98c7-47af-b2ef-2405da33c559) Here's `ollama` container logs when trying to chat with LLaMA 3.1 Lexi 8B Q6: ``` ollama | time=2024-10-22T16:33:28.879Z level=INFO source=images.go:754 msg="total blobs: 84" ollama | time=2024-10-22T16:33:28.880Z level=INFO source=images.go:761 msg="total unused blobs removed: 0" ollama | time=2024-10-22T16:33:28.880Z level=INFO source=routes.go:1217 msg="Listening on [::]:11434 (version 0.4.0-rc3)" ollama | time=2024-10-22T16:33:28.880Z level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm]" ollama | time=2024-10-22T16:33:28.880Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs" ollama | time=2024-10-22T16:33:28.882Z level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" ollama | time=2024-10-22T16:33:28.884Z level=INFO source=amd_linux.go:383 msg="amdgpu is supported" gpu=0 gpu_type=gfx1101 ollama | time=2024-10-22T16:33:28.884Z level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=1 total="512.0 MiB" ollama | time=2024-10-22T16:33:28.884Z level=INFO source=types.go:123 msg="inference compute" id=0 library=rocm variant="" compute=gfx1101 driver=0.0 name=1002:747e total="16.0 GiB" available="15.2 GiB" ollama | [GIN] 2024/10/22 - 16:33:39 | 200 | 1.70353ms | 172.24.0.3 | GET "/api/tags" ollama | [GIN] 2024/10/22 - 16:33:43 | 200 | 21.159µs | 172.24.0.1 | GET "/" ollama | time=2024-10-22T16:33:52.469Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-545ba086eb5179e8f97f9eb7d54c61555a7cd645c5b26f9551209022878abb2c gpu=0 parallel=4 available=16308183040 required="9.7 GiB" ollama | time=2024-10-22T16:33:52.469Z level=INFO source=llama-server.go:72 msg="system memory" total="30.5 GiB" free="24.8 GiB" free_swap="7.8 GiB" ollama | time=2024-10-22T16:33:52.469Z level=INFO source=memory.go:346 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[15.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="9.7 GiB" memory.required.partial="9.7 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[9.7 GiB]" memory.weights.total="7.9 GiB" memory.weights.repeating="7.4 GiB" memory.weights.nonrepeating="532.3 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" ollama | time=2024-10-22T16:33:52.470Z level=INFO source=llama-server.go:355 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm/ollama_llama_server --model /root/.ollama/models/blobs/sha256-545ba086eb5179e8f97f9eb7d54c61555a7cd645c5b26f9551209022878abb2c --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 33 --threads 12 --parallel 4 --port 34353" ollama | time=2024-10-22T16:33:52.470Z level=INFO source=sched.go:450 msg="loaded runners" count=1 ollama | time=2024-10-22T16:33:52.470Z level=INFO source=llama-server.go:534 msg="waiting for llama runner to start responding" ollama | time=2024-10-22T16:33:52.470Z level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server error" ollama | /usr/lib/ollama/runners/rocm/ollama_llama_server: error while loading shared libraries: libelf.so.1: cannot open shared object file: No such file or directory ollama | time=2024-10-22T16:33:52.720Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127" ``` Just in case: I have one integrated GPU and one additional GPU in PCI-e slot. I called this regression because it didn't work even on previously perfectly functioning model, not only on LLaMA 3.2. It looks like it's similar to #7279, but I'm not sure, as the output isn't completely identical. ### OS Linux, Docker ### GPU AMD ### CPU AMD ### Ollama version 0.4.0-rc3
GiteaMirror added the bug label 2026-04-28 18:45:01 -05:00
GiteaMirror added the amd label 2026-04-28 18:48:07 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51162