[GH-ISSUE #7542] llama runner process has terminated: exit status 127 #4797

Closed
opened 2026-04-12 15:46:07 -05:00 by GiteaMirror · 13 comments
Owner

Originally created by @jxyk2007 on GitHub (Nov 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7542

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

/home/vps# ollama list
NAME ID SIZE MODIFIED
llama3.2-vision:11b 38107a0cd119 7.9 GB 23 minutes ago
llava:34b 3d2d24f46674 20 GB 3 weeks ago
llava:latest 8dd30f6b0cb1 4.7 GB 3 weeks ago
llama3.2:latest a80c4f17acd5 2.0 GB 5 weeks ago
dog/qwen2.5-coder-7b:latest 100bd8b12808 6.5 GB 6 weeks ago
qwen2.5:32b 9f13ba1299af 19 GB 6 weeks ago
qwen2.5:72b 424bad2cc13f 47 GB 6 weeks ago
qwen2.5:7b 2a640a7f797c 4.7 GB 7 weeks ago
qwen2.5:1.5b c7417e908c30 986 MB 7 weeks ago
qwen2.5:3b adef96f382c7 1.9 GB 7 weeks ago
qwen2.5:0.5b dd239f17b9fa 397 MB 7 weeks ago
qwen2:7b dd314f039b9d 4.4 GB 7 weeks ago
qwen2:1.5b f6daf2b25194 934 MB 7 weeks ago
qwen2:0.5b 6f48b936a09f 352 MB 7 weeks ago
MFDoom/deepseek-v2-tool-calling:16b 8aec187c96c8 8.9 GB 7 weeks ago
majx13/test:latest bc0789d54213 4.4 GB 7 weeks ago
kuqoi/qwen2-tools:latest 3577919ece5a 352 MB 7 weeks ago
gemma2:latest ff02c3702f32 5.4 GB 7 weeks ago
dwightfoster03/functionary-small-v3.1:latest c1affc7e6d58 4.7 GB 7 weeks ago
deepseek-llm:7b-chat 9aab369a853b 4.0 GB 7 weeks ago
llama3.1:8b a23da2a80395 4.7 GB 3 months ago

(base) root@vps:/home/vps# ollama run llama3.2-vision:11b
Error: llama runner process has terminated: exit status 127

(base) root@vps:/home/vps# ollama -v
ollama version is 0.4.0

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.4.0

Originally created by @jxyk2007 on GitHub (Nov 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7542 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? /home/vps# ollama list NAME ID SIZE MODIFIED llama3.2-vision:11b 38107a0cd119 7.9 GB 23 minutes ago llava:34b 3d2d24f46674 20 GB 3 weeks ago llava:latest 8dd30f6b0cb1 4.7 GB 3 weeks ago llama3.2:latest a80c4f17acd5 2.0 GB 5 weeks ago dog/qwen2.5-coder-7b:latest 100bd8b12808 6.5 GB 6 weeks ago qwen2.5:32b 9f13ba1299af 19 GB 6 weeks ago qwen2.5:72b 424bad2cc13f 47 GB 6 weeks ago qwen2.5:7b 2a640a7f797c 4.7 GB 7 weeks ago qwen2.5:1.5b c7417e908c30 986 MB 7 weeks ago qwen2.5:3b adef96f382c7 1.9 GB 7 weeks ago qwen2.5:0.5b dd239f17b9fa 397 MB 7 weeks ago qwen2:7b dd314f039b9d 4.4 GB 7 weeks ago qwen2:1.5b f6daf2b25194 934 MB 7 weeks ago qwen2:0.5b 6f48b936a09f 352 MB 7 weeks ago MFDoom/deepseek-v2-tool-calling:16b 8aec187c96c8 8.9 GB 7 weeks ago majx13/test:latest bc0789d54213 4.4 GB 7 weeks ago kuqoi/qwen2-tools:latest 3577919ece5a 352 MB 7 weeks ago gemma2:latest ff02c3702f32 5.4 GB 7 weeks ago dwightfoster03/functionary-small-v3.1:latest c1affc7e6d58 4.7 GB 7 weeks ago deepseek-llm:7b-chat 9aab369a853b 4.0 GB 7 weeks ago llama3.1:8b a23da2a80395 4.7 GB 3 months ago (base) root@vps:/home/vps# ollama run llama3.2-vision:11b Error: llama runner process has terminated: exit status 127 (base) root@vps:/home/vps# ollama -v ollama version is 0.4.0 ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.4.0
GiteaMirror added the nvidiabug labels 2026-04-12 15:46:07 -05:00
Author
Owner

@jmorganca commented on GitHub (Nov 7, 2024):

@jxyk2007 sorry about that. Would it be possible to share the logs? journalctl -u ollama --no-pager

<!-- gh-comment-id:2461868510 --> @jmorganca commented on GitHub (Nov 7, 2024): @jxyk2007 sorry about that. Would it be possible to share the logs? `journalctl -u ollama --no-pager`
Author
Owner

@andrerclaudio commented on GitHub (Nov 7, 2024):

Same problem here just after update to ollama 0.4.0

ov 07 19:09:32 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:32 | 200 | 17.244µs | 127.0.0.1 | HEAD "/"
Nov 07 19:09:32 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:32 | 200 | 13.486µs | 127.0.0.1 | GET "/api/ps"
Nov 07 19:09:34 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:34 | 200 | 16.993µs | 127.0.0.1 | HEAD "/"
Nov 07 19:09:34 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:34 | 200 | 7.675µs | 127.0.0.1 | GET "/api/ps"
Nov 07 19:09:36 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:36 | 200 | 16.843µs | 127.0.0.1 | HEAD "/"
Nov 07 19:09:36 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:36 | 200 | 9.479µs | 127.0.0.1 | GET "/api/ps"

<!-- gh-comment-id:2463328957 --> @andrerclaudio commented on GitHub (Nov 7, 2024): Same problem here just after update to ollama 0.4.0 ov 07 19:09:32 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:32 | 200 | 17.244µs | 127.0.0.1 | HEAD "/" Nov 07 19:09:32 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:32 | 200 | 13.486µs | 127.0.0.1 | GET "/api/ps" Nov 07 19:09:34 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:34 | 200 | 16.993µs | 127.0.0.1 | HEAD "/" Nov 07 19:09:34 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:34 | 200 | 7.675µs | 127.0.0.1 | GET "/api/ps" Nov 07 19:09:36 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:36 | 200 | 16.843µs | 127.0.0.1 | HEAD "/" Nov 07 19:09:36 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:36 | 200 | 9.479µs | 127.0.0.1 | GET "/api/ps"
Author
Owner

@andrerclaudio commented on GitHub (Nov 7, 2024):

Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.539-03:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/home/asimov/.ol>
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.680-03:00 level=INFO source=server.go:105 msg="system memory" total="15.5 GiB" free="14.3 GiB" free_swap="96.0 GiB"
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.681-03:00 level=INFO source=memory.go:343 msg="offload to cuda" layers.requested=-1 layers.model=33 layers.offload=33 layers.spli>
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.681-03:00 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3926554203/runners/cuda_v12/ollama_llama_se>
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=server.go:567 msg="waiting for llama runner to start responding"
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=server.go:601 msg="waiting for server to become available" status="llm server error"
Nov 07 19:11:44 macondo ollama[961]: /tmp/ollama3926554203/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libggml_cuda_v12.so: cannot open shared object file: No >
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.932-03:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit statu>
Nov 07 19:11:44 macondo ollama[961]: [GIN] 2024/11/07 - 19:11:44 | 500 | 592.597936ms | 127.0.0.1 | POST "/api/generate"

<!-- gh-comment-id:2463333509 --> @andrerclaudio commented on GitHub (Nov 7, 2024): Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.539-03:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/home/asimov/.ol> Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.680-03:00 level=INFO source=server.go:105 msg="system memory" total="15.5 GiB" free="14.3 GiB" free_swap="96.0 GiB" Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.681-03:00 level=INFO source=memory.go:343 msg="offload to cuda" layers.requested=-1 layers.model=33 layers.offload=33 layers.spli> Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.681-03:00 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3926554203/runners/cuda_v12/ollama_llama_se> Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=server.go:567 msg="waiting for llama runner to start responding" Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=server.go:601 msg="waiting for server to become available" status="llm server error" Nov 07 19:11:44 macondo ollama[961]: /tmp/ollama3926554203/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libggml_cuda_v12.so: cannot open shared object file: No > Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.932-03:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit statu> Nov 07 19:11:44 macondo ollama[961]: [GIN] 2024/11/07 - 19:11:44 | 500 | 592.597936ms | 127.0.0.1 | POST "/api/generate"
Author
Owner

@dhiltgen commented on GitHub (Nov 7, 2024):

Fixed in #7560

<!-- gh-comment-id:2463339419 --> @dhiltgen commented on GitHub (Nov 7, 2024): Fixed in #7560
Author
Owner

@jxyk2007 commented on GitHub (Nov 8, 2024):

@jxyk2007 sorry about that. Would it be possible to share the logs? journalctl -u ollama --no-pager

ov 08 06:51:35 VPS ollama[1746]: time=2024-11-08T06:51:35.362Z level=WARN source=sched.go:137 msg="multimodal models don't support parallel requests yet" Nov 08 06:51:37 VPS ollama[1746]: time=2024-11-08T06:51:37.518Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 gpu=GPU-7e73fd5a-dd96-6674-4f0b-79e9c80afaa1 parallel=1 available=15534391296 required="11.3 GiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.440Z level=INFO source=server.go:105 msg="system memory" total="94.3 GiB" free="91.7 GiB" free_swap="8.0 GiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.444Z level=INFO source=memory.go:343 msg="offload to cuda" projector.weights="1.8 GiB" projector.graph="2.8 GiB" layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.3 GiB" memory.required.partial="11.3 GiB" memory.required.kv="656.2 MiB" memory.required.allocations="[11.3 GiB]" memory.weights.total="5.5 GiB" memory.weights.repeating="5.1 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="258.5 MiB" memory.graph.partial="669.5 MiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3255965013/runners/cuda_v12/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 --ctx-size 2048 --batch-size 512 --embedding --n-gpu-layers 41 --mmproj /usr/share/ollama/.ollama/models/blobs/sha256-ece5e659647a20a5c28ab9eea1c12a1ad430bc0f2a27021d00ad103b3bf5206f --threads 20 --parallel 1 --port 40275" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=sched.go:449 msg="loaded runners" count=1 Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=server.go:567 msg="waiting for llama runner to start responding" Nov 08 06:51:39 VPS ollama[1746]: /tmp/ollama3255965013/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libggml_cuda_v12.so: cannot open shared object file: No such file or directory Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.447Z level=INFO source=server.go:601 msg="waiting for server to become available" status="llm server error" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.697Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 127" Nov 08 06:51:39 VPS ollama[1746]: [GIN] 2024/11/08 - 06:51:39 | 500 | 4.404819377s | 127.0.0.1 | POST "/api/generate" Nov 08 06:51:45 VPS ollama[1746]: time=2024-11-08T06:51:45.810Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=6.112529405 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 Nov 08 06:51:47 VPS ollama[1746]: time=2024-11-08T06:51:47.686Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=7.988151227 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 Nov 08 06:51:49 VPS ollama[1746]: time=2024-11-08T06:51:49.586Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=9.888398177 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068

<!-- gh-comment-id:2463940598 --> @jxyk2007 commented on GitHub (Nov 8, 2024): > @jxyk2007 sorry about that. Would it be possible to share the logs? `journalctl -u ollama --no-pager` > `ov 08 06:51:35 VPS ollama[1746]: time=2024-11-08T06:51:35.362Z level=WARN source=sched.go:137 msg="multimodal models don't support parallel requests yet" Nov 08 06:51:37 VPS ollama[1746]: time=2024-11-08T06:51:37.518Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 gpu=GPU-7e73fd5a-dd96-6674-4f0b-79e9c80afaa1 parallel=1 available=15534391296 required="11.3 GiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.440Z level=INFO source=server.go:105 msg="system memory" total="94.3 GiB" free="91.7 GiB" free_swap="8.0 GiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.444Z level=INFO source=memory.go:343 msg="offload to cuda" projector.weights="1.8 GiB" projector.graph="2.8 GiB" layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.3 GiB" memory.required.partial="11.3 GiB" memory.required.kv="656.2 MiB" memory.required.allocations="[11.3 GiB]" memory.weights.total="5.5 GiB" memory.weights.repeating="5.1 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="258.5 MiB" memory.graph.partial="669.5 MiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3255965013/runners/cuda_v12/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 --ctx-size 2048 --batch-size 512 --embedding --n-gpu-layers 41 --mmproj /usr/share/ollama/.ollama/models/blobs/sha256-ece5e659647a20a5c28ab9eea1c12a1ad430bc0f2a27021d00ad103b3bf5206f --threads 20 --parallel 1 --port 40275" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=sched.go:449 msg="loaded runners" count=1 Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=server.go:567 msg="waiting for llama runner to start responding" Nov 08 06:51:39 VPS ollama[1746]: /tmp/ollama3255965013/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libggml_cuda_v12.so: cannot open shared object file: No such file or directory Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.447Z level=INFO source=server.go:601 msg="waiting for server to become available" status="llm server error" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.697Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 127" Nov 08 06:51:39 VPS ollama[1746]: [GIN] 2024/11/08 - 06:51:39 | 500 | 4.404819377s | 127.0.0.1 | POST "/api/generate" Nov 08 06:51:45 VPS ollama[1746]: time=2024-11-08T06:51:45.810Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=6.112529405 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 Nov 08 06:51:47 VPS ollama[1746]: time=2024-11-08T06:51:47.686Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=7.988151227 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 Nov 08 06:51:49 VPS ollama[1746]: time=2024-11-08T06:51:49.586Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=9.888398177 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068`
Author
Owner

@dhiltgen commented on GitHub (Nov 8, 2024):

@andrerclaudio please share an updated server log with OLLAMA_DEBUG=1 set running 0.4.1. Please include the full log from startup of the server so I can see all the settings it has, where the runners are located, etc.

<!-- gh-comment-id:2465916595 --> @dhiltgen commented on GitHub (Nov 8, 2024): @andrerclaudio please share an updated server log with OLLAMA_DEBUG=1 set running 0.4.1. Please include the full log from startup of the server so I can see all the settings it has, where the runners are located, etc.
Author
Owner

@andrerclaudio commented on GitHub (Nov 9, 2024):

Hi, @dhiltgen.

First of all, I apologize—this was my mistake. I’m truly sorry.

I started over, taking some screenshots, and found the error.
After unpacking the tar files, I was moving the ollama.bin to the folder where my systemd unit is configured to execute, but I wasn’t moving the library (lib) files with it. Oddly, this setup worked until now (before v0.4.0).

Now, by keeping the files as they are after unpacking and updating the ollama.service path to execute, nvidia-smi is not showing GPU memory usage, down in the right corner.

image

But anyway, it’s working now. Thanks, and sorry for the trouble.

<!-- gh-comment-id:2465968797 --> @andrerclaudio commented on GitHub (Nov 9, 2024): Hi, @dhiltgen. First of all, I apologize—this was my mistake. I’m truly sorry. I started over, taking some screenshots, and found the error. After unpacking the tar files, I was moving the ollama.bin to the folder where my systemd unit is configured to execute, but I wasn’t moving the library (lib) files with it. Oddly, this setup worked until now (before v0.4.0). Now, by keeping the files as they are after unpacking and updating the ollama.service path to execute, nvidia-smi is not showing GPU memory usage, down in the right corner. ![image](https://github.com/user-attachments/assets/80a66245-1425-46b6-9a1a-07ac8862169c) But anyway, it’s working now. Thanks, and sorry for the trouble.
Author
Owner

@jxyk2007 commented on GitHub (Nov 11, 2024):

 1670  echo $LD_LIBRARY_PATH
 1671  ll /usr/lib/x86_64-linux-gnu
 1672  journalctl -u ollama.service -b
 1673  ll /tmp/ollama2975340959/runners/cuda_v12/
 1674  nvcc --version
 1675  LD_LIBRARY_PATH
 1676  echo $LD_LIBRARY_PATH
 1677  cd /usr/lib/x86_64-linux-gnu
 1678  ll libggml_cuda_v12.so
 1679  ll
 1680  find libggml_cuda_v12.so
 1681  find / -name libggml_cuda_v12.so
 1682  export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH
 1683  source ~/.bashrc
 1684  ollama run qwen2:0.5b
 1685  vi /etc/systemd/system/ollama.service
 1686  sudo systemctl daemon-reload
 1687  sudo systemctl restart ollama.service
 1688  sudo systemctl daemon-reload
 1689  sudo systemctl restart ollama.service
 1690  ollama run qwen2:0.5b

find / -name libggml_cuda_v12.so

/root/lib/ollama/libggml_cuda_v12.so

ChatGPT 说:
ChatGPT
既然找到了 libggml_cuda_v12.so 文件位于 /root/lib/ollama/libggml_cuda_v12.so,可以将其添加到 LD_LIBRARY_PATH 中,使系统可以找到它。以下是具体的步骤:

  1. 暂时性地添加库路径
    您可以暂时添加该路径到 LD_LIBRARY_PATH,以测试是否有效:

bash
复制代码
export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH
然后再次运行 Ollama 命令进行测试:

bash
复制代码
ollama run qwen2:0.5b
2. 将路径永久添加到 LD_LIBRARY_PATH
如果上面的步骤成功,则可以将该路径永久添加到环境变量中。编辑 .bashrc 或 .profile 文件(通常在 /root 用户的主目录下),添加以下行:

bash
复制代码
export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH
之后,使用以下命令使更改立即生效:

bash
复制代码
source ~/.bashrc
3. 配置 Systemd 服务文件
如果 Ollama 是通过 Systemd 启动的服务,可以将该库路径添加到 Systemd 配置文件中,确保服务启动时也可以找到该文件。编辑 /etc/systemd/system/ollama.service 文件,在 [Service] 区块中添加:

ini
复制代码
[Service]
Environment="LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH"
然后重新加载 Systemd 配置并重启 Ollama 服务:

bash
复制代码
sudo systemctl daemon-reload
sudo systemctl restart ollama.service
4. 验证配置
重新启动 Ollama 后,检查是否还会出现 exit status 127 的错误。如果问题解决了,则说明库路径配置成功。

(base) root@vps:/home/vps# ollama run llama3.2-vision:latest

Send a message (/? for help)

<!-- gh-comment-id:2467132710 --> @jxyk2007 commented on GitHub (Nov 11, 2024): ``` 1670 echo $LD_LIBRARY_PATH 1671 ll /usr/lib/x86_64-linux-gnu 1672 journalctl -u ollama.service -b 1673 ll /tmp/ollama2975340959/runners/cuda_v12/ 1674 nvcc --version 1675 LD_LIBRARY_PATH 1676 echo $LD_LIBRARY_PATH 1677 cd /usr/lib/x86_64-linux-gnu 1678 ll libggml_cuda_v12.so 1679 ll 1680 find libggml_cuda_v12.so 1681 find / -name libggml_cuda_v12.so 1682 export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH 1683 source ~/.bashrc 1684 ollama run qwen2:0.5b 1685 vi /etc/systemd/system/ollama.service 1686 sudo systemctl daemon-reload 1687 sudo systemctl restart ollama.service 1688 sudo systemctl daemon-reload 1689 sudo systemctl restart ollama.service 1690 ollama run qwen2:0.5b ``` find / -name libggml_cuda_v12.so /root/lib/ollama/libggml_cuda_v12.so ChatGPT 说: ChatGPT 既然找到了 libggml_cuda_v12.so 文件位于 /root/lib/ollama/libggml_cuda_v12.so,可以将其添加到 LD_LIBRARY_PATH 中,使系统可以找到它。以下是具体的步骤: 1. 暂时性地添加库路径 您可以暂时添加该路径到 LD_LIBRARY_PATH,以测试是否有效: bash 复制代码 export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH 然后再次运行 Ollama 命令进行测试: bash 复制代码 ollama run qwen2:0.5b 2. 将路径永久添加到 LD_LIBRARY_PATH 如果上面的步骤成功,则可以将该路径永久添加到环境变量中。编辑 .bashrc 或 .profile 文件(通常在 /root 用户的主目录下),添加以下行: bash 复制代码 export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH 之后,使用以下命令使更改立即生效: bash 复制代码 source ~/.bashrc 3. 配置 Systemd 服务文件 如果 Ollama 是通过 Systemd 启动的服务,可以将该库路径添加到 Systemd 配置文件中,确保服务启动时也可以找到该文件。编辑 /etc/systemd/system/ollama.service 文件,在 [Service] 区块中添加: ini 复制代码 [Service] Environment="LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH" 然后重新加载 Systemd 配置并重启 Ollama 服务: bash 复制代码 sudo systemctl daemon-reload sudo systemctl restart ollama.service 4. 验证配置 重新启动 Ollama 后,检查是否还会出现 exit status 127 的错误。如果问题解决了,则说明库路径配置成功。 (base) root@vps:/home/vps# ollama run llama3.2-vision:latest ⠹ ⠴ >>> >>> >>> Send a message (/? for help)
Author
Owner

@dhiltgen commented on GitHub (Nov 14, 2024):

@jxyk2007 I'm not sure how you are installing ollama. If you use our install.sh script, or extract the ollama-linux-amd64.tgz file and don't move things around, then you should not have to set LD_LIBRARY_PATH. There are multiple libraries and binaries in the tar file that we expect to stay in the same layout.

It sounds like you did get it working, so I'll close this issue. If you are still having trouble, please share more details about how you are installing ollama, and the server log with OLLAMA_DEBUG=1 set so we can see the paths and settings to understand why it isn't able to find the libraries where it is expecting them.

<!-- gh-comment-id:2475187202 --> @dhiltgen commented on GitHub (Nov 14, 2024): @jxyk2007 I'm not sure how you are installing ollama. If you use our `install.sh` script, or extract the `ollama-linux-amd64.tgz` file and don't move things around, then you should not have to set LD_LIBRARY_PATH. There are multiple libraries and binaries in the tar file that we expect to stay in the same layout. It sounds like you did get it working, so I'll close this issue. If you are still having trouble, please share more details about how you are installing ollama, and the server log with `OLLAMA_DEBUG=1` set so we can see the paths and settings to understand why it isn't able to find the libraries where it is expecting them.
Author
Owner

@MYBJ185 commented on GitHub (Feb 8, 2025):

After directly installing using ollama-linux-amd64.tgz, I encountered this issue when running ollama run model. Later, the problem was resolved by re-running the installation command from the Ollama official website: curl -fsSL https://ollama.com/install.sh | sh. It's also possible that this was related to the file permissions on my server, which I had opened up. You can try that as well.

<!-- gh-comment-id:2644856885 --> @MYBJ185 commented on GitHub (Feb 8, 2025): After directly installing using ollama-linux-amd64.tgz, I encountered this issue when running ollama run model. Later, the problem was resolved by re-running the installation command from the Ollama official website: curl -fsSL https://ollama.com/install.sh | sh. It's also possible that this was related to the file permissions on my server, which I had opened up. You can try that as well.
Author
Owner

@ManishKhullar commented on GitHub (Jul 9, 2025):

ollama run llama3.2-vision:11b
Error: llama runner process has terminated: error:fault

  • Got above error
  • For a windows system where the model is downloaded in G: drive
  • setup of ollama is done in G: drive
  • by default the CPU is being used despite having an Nvidia GeForce GTX 960 Graphics adapter

Below is the server side error:

time=2025-07-09T14:23:42.280+05:30 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
time=2025-07-09T14:23:42.639+05:30 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server error"
time=2025-07-09T14:23:42.682+05:30 level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2"
time=2025-07-09T14:23:42.890+05:30 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: error:fault"
[GIN] 2025/07/09 - 14:23:42 | 500 |    4.7371714s |       127.0.0.1 | POST     "/api/generate"
time=2025-07-09T14:23:47.925+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0340531 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed
time=2025-07-09T14:23:48.175+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2844668 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed
time=2025-07-09T14:23:48.425+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5343092 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed
<!-- gh-comment-id:3051549750 --> @ManishKhullar commented on GitHub (Jul 9, 2025): ```ollama run llama3.2-vision:11b``` ```Error: llama runner process has terminated: error:fault``` - Got above error - For a windows system where the model is downloaded in G: drive - setup of ollama is done in G: drive - by default the CPU is being used despite having an Nvidia GeForce GTX 960 Graphics adapter Below is the server side error: ``` time=2025-07-09T14:23:42.280+05:30 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" time=2025-07-09T14:23:42.639+05:30 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server error" time=2025-07-09T14:23:42.682+05:30 level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2" time=2025-07-09T14:23:42.890+05:30 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: error:fault" [GIN] 2025/07/09 - 14:23:42 | 500 | 4.7371714s | 127.0.0.1 | POST "/api/generate" time=2025-07-09T14:23:47.925+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0340531 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed time=2025-07-09T14:23:48.175+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2844668 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed time=2025-07-09T14:23:48.425+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5343092 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed ```
Author
Owner

@dhiltgen commented on GitHub (Jul 17, 2025):

@ManishKhullar your scenario looks unrelated to this issue. I believe your GPU has only 2G of VRAM, however llama3.2-vision requires at least 6G to load on the GPU, so running on CPU is expected. Fully loaded the model takes ~12G of memory. You didn't include enough logs to see how much system memory you have, but it's possible your system may be under memory pressure. Take a look at Task Manager's memory view to see if you have enough space to fit this model, or try loading a smaller model. If that doesn't help, please open a new issue, and include more complete server logs with debug turned on so we can help troubleshoot.

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

<!-- gh-comment-id:3084709283 --> @dhiltgen commented on GitHub (Jul 17, 2025): @ManishKhullar your scenario looks unrelated to this issue. I believe your GPU has only 2G of VRAM, however llama3.2-vision requires at least 6G to load on the GPU, so running on CPU is expected. Fully loaded the model takes ~12G of memory. You didn't include enough logs to see how much system memory you have, but it's possible your system may be under memory pressure. Take a look at Task Manager's memory view to see if you have enough space to fit this model, or try loading a smaller model. If that doesn't help, please open a new issue, and include more complete server logs with debug turned on so we can help troubleshoot. https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
Author
Owner

@ManishKhullar commented on GitHub (Aug 4, 2025):

Hi
Thank you for the heads up.
I have 16 GB of DIMM RAM of which 12.2 is available while the browser and task manager is running.
Could you guide me where could I see the logs generated as I have added OLLAMA_DEBUG=true in user variables and am new to the ecosystem :)

<!-- gh-comment-id:3150114592 --> @ManishKhullar commented on GitHub (Aug 4, 2025): Hi Thank you for the heads up. I have 16 GB of DIMM RAM of which 12.2 is available while the browser and task manager is running. Could you guide me where could I see the logs generated as I have added OLLAMA_DEBUG=true in user variables and am new to the ecosystem :)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4797