[GH-ISSUE #13015] Ollama 0.12.10 Error: 500 Internal Server Error: do load request: Post EOF #8617

Closed
opened 2026-04-12 21:21:18 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @czardien on GitHub (Nov 8, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13015

What is the issue?

Ollama with an Nvidia GPU setup unexpectedly crashes when trying to run both the qwen3:4b and gemma3:4b models, with error message:

$ ollama run qwen3:4b
Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:43439/load": EOF
$ ollama run gemma3:4b
Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:45673/load": EOF

This may be related to issue: https://github.com/ollama/ollama/issues/12977 but I was told there to open a new issue. I attach to this issue the server logs, which I got through the journal, as it didn't fit in the "Relevant log output" textbox below.

ollama-server-logs.txt

OS: Archlinux
Packages:

ollama 0.12.10-1
ollama-cuda 0.12.10-1
lib32-nvidia-utils 580.95.05-1
linux-firmware-nvidia 20251021-1
nvidia 580.95.05-9
nvidia-settings 580.95.05-1
nvidia-utils 580.95.05-1
opencl-nvidia 580.95.05-1

NVidia GPU:

$ nvidia-smi
Sat Nov  8 09:59:52 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:01:00.0  On |                  N/A |
|  0%   28C    P8             19W /  275W |     638MiB /  11264MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A             535      G   /usr/lib/Xorg                           363MiB |
|    0   N/A  N/A            2350      G   alacritty                                 9MiB |
|    0   N/A  N/A            4446      G   /usr/lib/firefox/firefox                245MiB |
+-----------------------------------------------------------------------------------------+

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.12.10

Originally created by @czardien on GitHub (Nov 8, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13015 ### What is the issue? Ollama with an Nvidia GPU setup unexpectedly crashes when trying to run both the `qwen3:4b` and `gemma3:4b` models, with error message: ``` $ ollama run qwen3:4b Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:43439/load": EOF $ ollama run gemma3:4b Error: 500 Internal Server Error: do load request: Post "http://127.0.0.1:45673/load": EOF ``` This may be related to issue: https://github.com/ollama/ollama/issues/12977 but I was told there to open a new issue. I attach to this issue the server logs, which I got through the journal, as it didn't fit in the "Relevant log output" textbox below. [ollama-server-logs.txt](https://github.com/user-attachments/files/23431126/ollama-server-logs.txt) OS: Archlinux Packages: ``` ollama 0.12.10-1 ollama-cuda 0.12.10-1 ``` ``` lib32-nvidia-utils 580.95.05-1 linux-firmware-nvidia 20251021-1 nvidia 580.95.05-9 nvidia-settings 580.95.05-1 nvidia-utils 580.95.05-1 opencl-nvidia 580.95.05-1 ``` NVidia GPU: ``` $ nvidia-smi Sat Nov 8 09:59:52 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1080 Ti Off | 00000000:01:00.0 On | N/A | | 0% 28C P8 19W / 275W | 638MiB / 11264MiB | 4% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 535 G /usr/lib/Xorg 363MiB | | 0 N/A N/A 2350 G alacritty 9MiB | | 0 N/A N/A 4446 G /usr/lib/firefox/firefox 245MiB | +-----------------------------------------------------------------------------------------+ ``` ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.10
GiteaMirror added the bug label 2026-04-12 21:21:18 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 8, 2025):

Nov 08 10:02:08 Arrakis ollama[9945]:   Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes,
 ID: GPU-eba3c3e6-1b40-d999-7f01-4f6808482eea
Nov 08 10:02:08 Arrakis ollama[9945]: time=2025-11-08T10:02:08.377Z level=INFO source=ggml.go:104 msg=system
 CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1
 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,880,890,900,1000,1030,1100,1200,1210
 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)

Your device has compute capability 6.1 but ollama has only been compiled with compute capability of 7.5 and above. The Arch maintainers have decided to move to CUDA13 so your device is no longer supported in Arch. Depending on distro dependencies, it may be possible to use the ollama release.

<!-- gh-comment-id:3506532311 --> @rick-github commented on GitHub (Nov 8, 2025): ``` Nov 08 10:02:08 Arrakis ollama[9945]: Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1, VMM: yes, ID: GPU-eba3c3e6-1b40-d999-7f01-4f6808482eea Nov 08 10:02:08 Arrakis ollama[9945]: time=2025-11-08T10:02:08.377Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,880,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) ``` Your device has compute capability 6.1 but ollama has only been compiled with compute capability of 7.5 and above. The Arch maintainers have [decided](https://gitlab.archlinux.org/archlinux/packaging/packages/ollama/-/issues/26#note_325899) to move to CUDA13 so your device is no longer supported in Arch. Depending on distro dependencies, it may be possible to use the [ollama release](https://ollama.com/download/linux).
Author
Owner

@czardien commented on GitHub (Nov 8, 2025):

Thank you for getting back to me so promptly. I'm going through the
links, but just to clarify do you mean it's my GPU specifically that is
not supported by this version of CUDA?

I'll try with other releases of Ollama as you suggested.

Also, thinking of being helpful, maybe it'd be worth for you folks to
show a more specific error message, as the one I faced was too cryptic
maybe? Unless you expect Ollama admins to be more knowledgeable and run
their checks, which would probably be fair.

<!-- gh-comment-id:3506550138 --> @czardien commented on GitHub (Nov 8, 2025): Thank you for getting back to me so promptly. I'm going through the links, but just to clarify do you mean it's my GPU specifically that is not supported by this version of CUDA? I'll try with other releases of Ollama as you suggested. Also, thinking of being helpful, maybe it'd be worth for you folks to show a more specific error message, as the one I faced was too cryptic maybe? Unless you expect Ollama admins to be more knowledgeable and run their checks, which would probably be fair.
Author
Owner

@rick-github commented on GitHub (Nov 8, 2025):

Any GPUs on this list won't be supported in Arch.

<!-- gh-comment-id:3506696724 --> @rick-github commented on GitHub (Nov 8, 2025): Any GPUs on [this list](https://developer.nvidia.com/cuda-legacy-gpus) won't be supported in Arch.
Author
Owner

@BigTorro commented on GitHub (Nov 9, 2025):

I don’t know — I have a 4070 Ti running on Ubuntu (24.04.3 LTS), but with the latest Ollama version, I’m getting the same error. I encountered this error while using OpenWebUI. After digging into it, I found a partial solution: manually unmount the model via CLI or using the “Unmount” button in OpenWebUI, then load the new model. I believe the issue is related to insufficient memory, but I’m new to this error. Until the previous version of Ollama and OpenWebUI (which was updated a few days ago), this problem didn’t occur — I suspect the issue lies in OpenWebUI, which may not unmount the model properly before loading a new one.

PS: The error occurs approximately 90% of the time when loading qwen3vl. There’s no issue when switching between GPT and Llama models.

<!-- gh-comment-id:3507502751 --> @BigTorro commented on GitHub (Nov 9, 2025): I don’t know — I have a 4070 Ti running on Ubuntu (24.04.3 LTS), but with the latest Ollama version, I’m getting the same error. I encountered this error while using OpenWebUI. After digging into it, I found a partial solution: manually unmount the model via CLI or using the “Unmount” button in OpenWebUI, then load the new model. I believe the issue is related to insufficient memory, but I’m new to this error. Until the previous version of Ollama and OpenWebUI (which was updated a few days ago), this problem didn’t occur — I suspect the issue lies in OpenWebUI, which may not unmount the model properly before loading a new one. PS: The error occurs approximately 90% of the time when loading qwen3vl. There’s no issue when switching between GPT and Llama models.
Author
Owner

@rick-github commented on GitHub (Nov 9, 2025):

@BigTorro Your problem is not the same, although the error message is. Open a new issue and add the server log.

<!-- gh-comment-id:3508917722 --> @rick-github commented on GitHub (Nov 9, 2025): @BigTorro Your problem is not the same, although the error message is. Open a new issue and add the [server log](https://docs.ollama.com/troubleshooting).
Author
Owner

@czardien commented on GitHub (Nov 26, 2025):

Just wanna point out that I was able to fix this by using the ollama-cuda12-bin AUR package, and could verify Ollama was now picking up my GPU gracefully. I had to tweak the PKGBUILD's pkgname a little bit but it was smooth otherwise.

<!-- gh-comment-id:3582418958 --> @czardien commented on GitHub (Nov 26, 2025): Just wanna point out that I was able to fix this by using the `ollama-cuda12-bin` AUR package, and could verify Ollama was now picking up my GPU gracefully. I had to tweak the PKGBUILD's `pkgname` a little bit but it was smooth otherwise.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8617