[GH-ISSUE #7542] llama runner process has terminated: exit status 127 #4797

New Issue

GiteaMirror · 2026-04-12T15:46:07-05:00

GiteaMirror commented

2026-04-12 15:46:07 -05:00

Originally created by @jxyk2007 on GitHub (Nov 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7542

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

/home/vps# ollama list
NAME ID SIZE MODIFIED
llama3.2-vision:11b 38107a0cd119 7.9 GB 23 minutes ago
llava:34b 3d2d24f46674 20 GB 3 weeks ago
llava:latest 8dd30f6b0cb1 4.7 GB 3 weeks ago
llama3.2:latest a80c4f17acd5 2.0 GB 5 weeks ago
dog/qwen2.5-coder-7b:latest 100bd8b12808 6.5 GB 6 weeks ago
qwen2.5:32b 9f13ba1299af 19 GB 6 weeks ago
qwen2.5:72b 424bad2cc13f 47 GB 6 weeks ago
qwen2.5:7b 2a640a7f797c 4.7 GB 7 weeks ago
qwen2.5:1.5b c7417e908c30 986 MB 7 weeks ago
qwen2.5:3b adef96f382c7 1.9 GB 7 weeks ago
qwen2.5:0.5b dd239f17b9fa 397 MB 7 weeks ago
qwen2:7b dd314f039b9d 4.4 GB 7 weeks ago
qwen2:1.5b f6daf2b25194 934 MB 7 weeks ago
qwen2:0.5b 6f48b936a09f 352 MB 7 weeks ago
MFDoom/deepseek-v2-tool-calling:16b 8aec187c96c8 8.9 GB 7 weeks ago
majx13/test:latest bc0789d54213 4.4 GB 7 weeks ago
kuqoi/qwen2-tools:latest 3577919ece5a 352 MB 7 weeks ago
gemma2:latest ff02c3702f32 5.4 GB 7 weeks ago
dwightfoster03/functionary-small-v3.1:latest c1affc7e6d58 4.7 GB 7 weeks ago
deepseek-llm:7b-chat 9aab369a853b 4.0 GB 7 weeks ago
llama3.1:8b a23da2a80395 4.7 GB 3 months ago

(base) root@vps:/home/vps# ollama run llama3.2-vision:11b
Error: llama runner process has terminated: exit status 127

(base) root@vps:/home/vps# ollama -v
ollama version is 0.4.0

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.4.0

Originally created by @jxyk2007 on GitHub (Nov 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7542 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? /home/vps# ollama list NAME ID SIZE MODIFIED llama3.2-vision:11b 38107a0cd119 7.9 GB 23 minutes ago llava:34b 3d2d24f46674 20 GB 3 weeks ago llava:latest 8dd30f6b0cb1 4.7 GB 3 weeks ago llama3.2:latest a80c4f17acd5 2.0 GB 5 weeks ago dog/qwen2.5-coder-7b:latest 100bd8b12808 6.5 GB 6 weeks ago qwen2.5:32b 9f13ba1299af 19 GB 6 weeks ago qwen2.5:72b 424bad2cc13f 47 GB 6 weeks ago qwen2.5:7b 2a640a7f797c 4.7 GB 7 weeks ago qwen2.5:1.5b c7417e908c30 986 MB 7 weeks ago qwen2.5:3b adef96f382c7 1.9 GB 7 weeks ago qwen2.5:0.5b dd239f17b9fa 397 MB 7 weeks ago qwen2:7b dd314f039b9d 4.4 GB 7 weeks ago qwen2:1.5b f6daf2b25194 934 MB 7 weeks ago qwen2:0.5b 6f48b936a09f 352 MB 7 weeks ago MFDoom/deepseek-v2-tool-calling:16b 8aec187c96c8 8.9 GB 7 weeks ago majx13/test:latest bc0789d54213 4.4 GB 7 weeks ago kuqoi/qwen2-tools:latest 3577919ece5a 352 MB 7 weeks ago gemma2:latest ff02c3702f32 5.4 GB 7 weeks ago dwightfoster03/functionary-small-v3.1:latest c1affc7e6d58 4.7 GB 7 weeks ago deepseek-llm:7b-chat 9aab369a853b 4.0 GB 7 weeks ago llama3.1:8b a23da2a80395 4.7 GB 3 months ago (base) root@vps:/home/vps# ollama run llama3.2-vision:11b Error: llama runner process has terminated: exit status 127 (base) root@vps:/home/vps# ollama -v ollama version is 0.4.0 ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.4.0

GiteaMirror added the nvidia bug labels 2026-04-12 15:46:07 -05:00

GiteaMirror closed this issue

2026-04-12 15:46:08 -05:00

GiteaMirror commented

2026-04-12 15:46:09 -05:00

@jmorganca commented on GitHub (Nov 7, 2024):

@jxyk2007 sorry about that. Would it be possible to share the logs? journalctl -u ollama --no-pager

@jmorganca commented on GitHub (Nov 7, 2024): @jxyk2007 sorry about that. Would it be possible to share the logs? `journalctl -u ollama --no-pager`

GiteaMirror commented

2026-04-12 15:46:09 -05:00

@andrerclaudio commented on GitHub (Nov 7, 2024):

Same problem here just after update to ollama 0.4.0

ov 07 19:09:32 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:32 | 200 | 17.244µs | 127.0.0.1 | HEAD "/"
Nov 07 19:09:32 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:32 | 200 | 13.486µs | 127.0.0.1 | GET "/api/ps"
Nov 07 19:09:34 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:34 | 200 | 16.993µs | 127.0.0.1 | HEAD "/"
Nov 07 19:09:34 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:34 | 200 | 7.675µs | 127.0.0.1 | GET "/api/ps"
Nov 07 19:09:36 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:36 | 200 | 16.843µs | 127.0.0.1 | HEAD "/"
Nov 07 19:09:36 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:36 | 200 | 9.479µs | 127.0.0.1 | GET "/api/ps"

@andrerclaudio commented on GitHub (Nov 7, 2024): Same problem here just after update to ollama 0.4.0 ov 07 19:09:32 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:32 | 200 | 17.244µs | 127.0.0.1 | HEAD "/" Nov 07 19:09:32 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:32 | 200 | 13.486µs | 127.0.0.1 | GET "/api/ps" Nov 07 19:09:34 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:34 | 200 | 16.993µs | 127.0.0.1 | HEAD "/" Nov 07 19:09:34 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:34 | 200 | 7.675µs | 127.0.0.1 | GET "/api/ps" Nov 07 19:09:36 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:36 | 200 | 16.843µs | 127.0.0.1 | HEAD "/" Nov 07 19:09:36 macondo ollama[961]: [GIN] 2024/11/07 - 19:09:36 | 200 | 9.479µs | 127.0.0.1 | GET "/api/ps"

GiteaMirror commented

2026-04-12 15:46:10 -05:00

@andrerclaudio commented on GitHub (Nov 7, 2024):

Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.539-03:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/home/asimov/.ol>
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.680-03:00 level=INFO source=server.go:105 msg="system memory" total="15.5 GiB" free="14.3 GiB" free_swap="96.0 GiB"
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.681-03:00 level=INFO source=memory.go:343 msg="offload to cuda" layers.requested=-1 layers.model=33 layers.offload=33 layers.spli>
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.681-03:00 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3926554203/runners/cuda_v12/ollama_llama_se>
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=server.go:567 msg="waiting for llama runner to start responding"
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=server.go:601 msg="waiting for server to become available" status="llm server error"
Nov 07 19:11:44 macondo ollama[961]: /tmp/ollama3926554203/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libggml_cuda_v12.so: cannot open shared object file: No >
Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.932-03:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit statu>
Nov 07 19:11:44 macondo ollama[961]: [GIN] 2024/11/07 - 19:11:44 | 500 | 592.597936ms | 127.0.0.1 | POST "/api/generate"

@andrerclaudio commented on GitHub (Nov 7, 2024): Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.539-03:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/home/asimov/.ol> Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.680-03:00 level=INFO source=server.go:105 msg="system memory" total="15.5 GiB" free="14.3 GiB" free_swap="96.0 GiB" Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.681-03:00 level=INFO source=memory.go:343 msg="offload to cuda" layers.requested=-1 layers.model=33 layers.offload=33 layers.spli> Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.681-03:00 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3926554203/runners/cuda_v12/ollama_llama_se> Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=sched.go:449 msg="loaded runners" count=1 Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=server.go:567 msg="waiting for llama runner to start responding" Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.682-03:00 level=INFO source=server.go:601 msg="waiting for server to become available" status="llm server error" Nov 07 19:11:44 macondo ollama[961]: /tmp/ollama3926554203/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libggml_cuda_v12.so: cannot open shared object file: No > Nov 07 19:11:44 macondo ollama[961]: time=2024-11-07T19:11:44.932-03:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit statu> Nov 07 19:11:44 macondo ollama[961]: [GIN] 2024/11/07 - 19:11:44 | 500 | 592.597936ms | 127.0.0.1 | POST "/api/generate"

GiteaMirror commented

2026-04-12 15:46:11 -05:00

@dhiltgen commented on GitHub (Nov 7, 2024):

Fixed in #7560

@dhiltgen commented on GitHub (Nov 7, 2024): Fixed in #7560

GiteaMirror commented

2026-04-12 15:46:12 -05:00

@jxyk2007 commented on GitHub (Nov 8, 2024):

@jxyk2007 sorry about that. Would it be possible to share the logs? journalctl -u ollama --no-pager

ov 08 06:51:35 VPS ollama[1746]: time=2024-11-08T06:51:35.362Z level=WARN source=sched.go:137 msg="multimodal models don't support parallel requests yet" Nov 08 06:51:37 VPS ollama[1746]: time=2024-11-08T06:51:37.518Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 gpu=GPU-7e73fd5a-dd96-6674-4f0b-79e9c80afaa1 parallel=1 available=15534391296 required="11.3 GiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.440Z level=INFO source=server.go:105 msg="system memory" total="94.3 GiB" free="91.7 GiB" free_swap="8.0 GiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.444Z level=INFO source=memory.go:343 msg="offload to cuda" projector.weights="1.8 GiB" projector.graph="2.8 GiB" layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.3 GiB" memory.required.partial="11.3 GiB" memory.required.kv="656.2 MiB" memory.required.allocations="[11.3 GiB]" memory.weights.total="5.5 GiB" memory.weights.repeating="5.1 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="258.5 MiB" memory.graph.partial="669.5 MiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3255965013/runners/cuda_v12/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 --ctx-size 2048 --batch-size 512 --embedding --n-gpu-layers 41 --mmproj /usr/share/ollama/.ollama/models/blobs/sha256-ece5e659647a20a5c28ab9eea1c12a1ad430bc0f2a27021d00ad103b3bf5206f --threads 20 --parallel 1 --port 40275" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=sched.go:449 msg="loaded runners" count=1 Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=server.go:567 msg="waiting for llama runner to start responding" Nov 08 06:51:39 VPS ollama[1746]: /tmp/ollama3255965013/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libggml_cuda_v12.so: cannot open shared object file: No such file or directory Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.447Z level=INFO source=server.go:601 msg="waiting for server to become available" status="llm server error" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.697Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 127" Nov 08 06:51:39 VPS ollama[1746]: [GIN] 2024/11/08 - 06:51:39 | 500 | 4.404819377s | 127.0.0.1 | POST "/api/generate" Nov 08 06:51:45 VPS ollama[1746]: time=2024-11-08T06:51:45.810Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=6.112529405 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 Nov 08 06:51:47 VPS ollama[1746]: time=2024-11-08T06:51:47.686Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=7.988151227 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 Nov 08 06:51:49 VPS ollama[1746]: time=2024-11-08T06:51:49.586Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=9.888398177 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068

@jxyk2007 commented on GitHub (Nov 8, 2024): > @jxyk2007 sorry about that. Would it be possible to share the logs? `journalctl -u ollama --no-pager` > `ov 08 06:51:35 VPS ollama[1746]: time=2024-11-08T06:51:35.362Z level=WARN source=sched.go:137 msg="multimodal models don't support parallel requests yet" Nov 08 06:51:37 VPS ollama[1746]: time=2024-11-08T06:51:37.518Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 gpu=GPU-7e73fd5a-dd96-6674-4f0b-79e9c80afaa1 parallel=1 available=15534391296 required="11.3 GiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.440Z level=INFO source=server.go:105 msg="system memory" total="94.3 GiB" free="91.7 GiB" free_swap="8.0 GiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.444Z level=INFO source=memory.go:343 msg="offload to cuda" projector.weights="1.8 GiB" projector.graph="2.8 GiB" layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.3 GiB" memory.required.partial="11.3 GiB" memory.required.kv="656.2 MiB" memory.required.allocations="[11.3 GiB]" memory.weights.total="5.5 GiB" memory.weights.repeating="5.1 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="258.5 MiB" memory.graph.partial="669.5 MiB" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3255965013/runners/cuda_v12/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 --ctx-size 2048 --batch-size 512 --embedding --n-gpu-layers 41 --mmproj /usr/share/ollama/.ollama/models/blobs/sha256-ece5e659647a20a5c28ab9eea1c12a1ad430bc0f2a27021d00ad103b3bf5206f --threads 20 --parallel 1 --port 40275" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=sched.go:449 msg="loaded runners" count=1 Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.446Z level=INFO source=server.go:567 msg="waiting for llama runner to start responding" Nov 08 06:51:39 VPS ollama[1746]: /tmp/ollama3255965013/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libggml_cuda_v12.so: cannot open shared object file: No such file or directory Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.447Z level=INFO source=server.go:601 msg="waiting for server to become available" status="llm server error" Nov 08 06:51:39 VPS ollama[1746]: time=2024-11-08T06:51:39.697Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 127" Nov 08 06:51:39 VPS ollama[1746]: [GIN] 2024/11/08 - 06:51:39 | 500 | 4.404819377s | 127.0.0.1 | POST "/api/generate" Nov 08 06:51:45 VPS ollama[1746]: time=2024-11-08T06:51:45.810Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=6.112529405 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 Nov 08 06:51:47 VPS ollama[1746]: time=2024-11-08T06:51:47.686Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=7.988151227 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068 Nov 08 06:51:49 VPS ollama[1746]: time=2024-11-08T06:51:49.586Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=9.888398177 model=/usr/share/ollama/.ollama/models/blobs/sha256-11f274007f093fefeec994a5dbbb33d0733a4feb87f7ab66dcd7c1069fef0068`

GiteaMirror commented

2026-04-12 15:46:12 -05:00

@dhiltgen commented on GitHub (Nov 8, 2024):

@andrerclaudio please share an updated server log with OLLAMA_DEBUG=1 set running 0.4.1. Please include the full log from startup of the server so I can see all the settings it has, where the runners are located, etc.

@dhiltgen commented on GitHub (Nov 8, 2024): @andrerclaudio please share an updated server log with OLLAMA_DEBUG=1 set running 0.4.1. Please include the full log from startup of the server so I can see all the settings it has, where the runners are located, etc.

GiteaMirror commented

2026-04-12 15:46:13 -05:00

@andrerclaudio commented on GitHub (Nov 9, 2024):

Hi, @dhiltgen.

First of all, I apologize—this was my mistake. I’m truly sorry.

I started over, taking some screenshots, and found the error.
After unpacking the tar files, I was moving the ollama.bin to the folder where my systemd unit is configured to execute, but I wasn’t moving the library (lib) files with it. Oddly, this setup worked until now (before v0.4.0).

Now, by keeping the files as they are after unpacking and updating the ollama.service path to execute, nvidia-smi is not showing GPU memory usage, down in the right corner.

But anyway, it’s working now. Thanks, and sorry for the trouble.

@andrerclaudio commented on GitHub (Nov 9, 2024): Hi, @dhiltgen. First of all, I apologize—this was my mistake. I’m truly sorry. I started over, taking some screenshots, and found the error. After unpacking the tar files, I was moving the ollama.bin to the folder where my systemd unit is configured to execute, but I wasn’t moving the library (lib) files with it. Oddly, this setup worked until now (before v0.4.0). Now, by keeping the files as they are after unpacking and updating the ollama.service path to execute, nvidia-smi is not showing GPU memory usage, down in the right corner. ![image](https://github.com/user-attachments/assets/80a66245-1425-46b6-9a1a-07ac8862169c) But anyway, it’s working now. Thanks, and sorry for the trouble.

GiteaMirror commented

2026-04-12 15:46:13 -05:00

@jxyk2007 commented on GitHub (Nov 11, 2024):

 1670  echo $LD_LIBRARY_PATH
 1671  ll /usr/lib/x86_64-linux-gnu
 1672  journalctl -u ollama.service -b
 1673  ll /tmp/ollama2975340959/runners/cuda_v12/
 1674  nvcc --version
 1675  LD_LIBRARY_PATH
 1676  echo $LD_LIBRARY_PATH
 1677  cd /usr/lib/x86_64-linux-gnu
 1678  ll libggml_cuda_v12.so
 1679  ll
 1680  find libggml_cuda_v12.so
 1681  find / -name libggml_cuda_v12.so
 1682  export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH
 1683  source ~/.bashrc
 1684  ollama run qwen2:0.5b
 1685  vi /etc/systemd/system/ollama.service
 1686  sudo systemctl daemon-reload
 1687  sudo systemctl restart ollama.service
 1688  sudo systemctl daemon-reload
 1689  sudo systemctl restart ollama.service
 1690  ollama run qwen2:0.5b

find / -name libggml_cuda_v12.so

/root/lib/ollama/libggml_cuda_v12.so

ChatGPT 说：
ChatGPT
既然找到了 libggml_cuda_v12.so 文件位于 /root/lib/ollama/libggml_cuda_v12.so，可以将其添加到 LD_LIBRARY_PATH 中，使系统可以找到它。以下是具体的步骤：

暂时性地添加库路径
您可以暂时添加该路径到 LD_LIBRARY_PATH，以测试是否有效：

bash
复制代码
export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH
然后再次运行 Ollama 命令进行测试：

bash
复制代码
ollama run qwen2:0.5b
2. 将路径永久添加到 LD_LIBRARY_PATH
如果上面的步骤成功，则可以将该路径永久添加到环境变量中。编辑 .bashrc 或 .profile 文件（通常在 /root 用户的主目录下），添加以下行：

bash
复制代码
export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH
之后，使用以下命令使更改立即生效：

bash
复制代码
source ~/.bashrc
3. 配置 Systemd 服务文件
如果 Ollama 是通过 Systemd 启动的服务，可以将该库路径添加到 Systemd 配置文件中，确保服务启动时也可以找到该文件。编辑 /etc/systemd/system/ollama.service 文件，在 [Service] 区块中添加：

ini
复制代码
[Service]
Environment="LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH"
然后重新加载 Systemd 配置并重启 Ollama 服务：

bash
复制代码
sudo systemctl daemon-reload
sudo systemctl restart ollama.service
4. 验证配置
重新启动 Ollama 后，检查是否还会出现 exit status 127 的错误。如果问题解决了，则说明库路径配置成功。

(base) root@vps:/home/vps# ollama run llama3.2-vision:latest
⠹
⠴

Send a message (/? for help)

@jxyk2007 commented on GitHub (Nov 11, 2024): ``` 1670 echo $LD_LIBRARY_PATH 1671 ll /usr/lib/x86_64-linux-gnu 1672 journalctl -u ollama.service -b 1673 ll /tmp/ollama2975340959/runners/cuda_v12/ 1674 nvcc --version 1675 LD_LIBRARY_PATH 1676 echo $LD_LIBRARY_PATH 1677 cd /usr/lib/x86_64-linux-gnu 1678 ll libggml_cuda_v12.so 1679 ll 1680 find libggml_cuda_v12.so 1681 find / -name libggml_cuda_v12.so 1682 export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH 1683 source ~/.bashrc 1684 ollama run qwen2:0.5b 1685 vi /etc/systemd/system/ollama.service 1686 sudo systemctl daemon-reload 1687 sudo systemctl restart ollama.service 1688 sudo systemctl daemon-reload 1689 sudo systemctl restart ollama.service 1690 ollama run qwen2:0.5b ``` find / -name libggml_cuda_v12.so /root/lib/ollama/libggml_cuda_v12.so ChatGPT 说： ChatGPT 既然找到了 libggml_cuda_v12.so 文件位于 /root/lib/ollama/libggml_cuda_v12.so，可以将其添加到 LD_LIBRARY_PATH 中，使系统可以找到它。以下是具体的步骤： 1. 暂时性地添加库路径您可以暂时添加该路径到 LD_LIBRARY_PATH，以测试是否有效： bash 复制代码 export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH 然后再次运行 Ollama 命令进行测试： bash 复制代码 ollama run qwen2:0.5b 2. 将路径永久添加到 LD_LIBRARY_PATH 如果上面的步骤成功，则可以将该路径永久添加到环境变量中。编辑 .bashrc 或 .profile 文件（通常在 /root 用户的主目录下），添加以下行： bash 复制代码 export LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH 之后，使用以下命令使更改立即生效： bash 复制代码 source ~/.bashrc 3. 配置 Systemd 服务文件如果 Ollama 是通过 Systemd 启动的服务，可以将该库路径添加到 Systemd 配置文件中，确保服务启动时也可以找到该文件。编辑 /etc/systemd/system/ollama.service 文件，在 [Service] 区块中添加： ini 复制代码 [Service] Environment="LD_LIBRARY_PATH=/root/lib/ollama:$LD_LIBRARY_PATH" 然后重新加载 Systemd 配置并重启 Ollama 服务： bash 复制代码 sudo systemctl daemon-reload sudo systemctl restart ollama.service 4. 验证配置重新启动 Ollama 后，检查是否还会出现 exit status 127 的错误。如果问题解决了，则说明库路径配置成功。 (base) root@vps:/home/vps# ollama run llama3.2-vision:latest ⠹ ⠴ >>> >>> >>> Send a message (/? for help)

GiteaMirror commented

2026-04-12 15:46:14 -05:00

@dhiltgen commented on GitHub (Nov 14, 2024):

@jxyk2007 I'm not sure how you are installing ollama. If you use our install.sh script, or extract the ollama-linux-amd64.tgz file and don't move things around, then you should not have to set LD_LIBRARY_PATH. There are multiple libraries and binaries in the tar file that we expect to stay in the same layout.

It sounds like you did get it working, so I'll close this issue. If you are still having trouble, please share more details about how you are installing ollama, and the server log with OLLAMA_DEBUG=1 set so we can see the paths and settings to understand why it isn't able to find the libraries where it is expecting them.

@dhiltgen commented on GitHub (Nov 14, 2024): @jxyk2007 I'm not sure how you are installing ollama. If you use our `install.sh` script, or extract the `ollama-linux-amd64.tgz` file and don't move things around, then you should not have to set LD_LIBRARY_PATH. There are multiple libraries and binaries in the tar file that we expect to stay in the same layout. It sounds like you did get it working, so I'll close this issue. If you are still having trouble, please share more details about how you are installing ollama, and the server log with `OLLAMA_DEBUG=1` set so we can see the paths and settings to understand why it isn't able to find the libraries where it is expecting them.

GiteaMirror commented

2026-04-12 15:46:15 -05:00

@MYBJ185 commented on GitHub (Feb 8, 2025):

After directly installing using ollama-linux-amd64.tgz, I encountered this issue when running ollama run model. Later, the problem was resolved by re-running the installation command from the Ollama official website: curl -fsSL https://ollama.com/install.sh | sh. It's also possible that this was related to the file permissions on my server, which I had opened up. You can try that as well.

@MYBJ185 commented on GitHub (Feb 8, 2025): After directly installing using ollama-linux-amd64.tgz, I encountered this issue when running ollama run model. Later, the problem was resolved by re-running the installation command from the Ollama official website: curl -fsSL https://ollama.com/install.sh | sh. It's also possible that this was related to the file permissions on my server, which I had opened up. You can try that as well.

GiteaMirror commented

2026-04-12 15:46:15 -05:00

@ManishKhullar commented on GitHub (Jul 9, 2025):

ollama run llama3.2-vision:11b
Error: llama runner process has terminated: error:fault

Got above error
For a windows system where the model is downloaded in G: drive
setup of ollama is done in G: drive
by default the CPU is being used despite having an Nvidia GeForce GTX 960 Graphics adapter

Below is the server side error:

time=2025-07-09T14:23:42.280+05:30 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
time=2025-07-09T14:23:42.639+05:30 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server error"
time=2025-07-09T14:23:42.682+05:30 level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2"
time=2025-07-09T14:23:42.890+05:30 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: error:fault"
[GIN] 2025/07/09 - 14:23:42 | 500 |    4.7371714s |       127.0.0.1 | POST     "/api/generate"
time=2025-07-09T14:23:47.925+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0340531 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed
time=2025-07-09T14:23:48.175+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2844668 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed
time=2025-07-09T14:23:48.425+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5343092 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed

@ManishKhullar commented on GitHub (Jul 9, 2025): ```ollama run llama3.2-vision:11b``` ```Error: llama runner process has terminated: error:fault``` - Got above error - For a windows system where the model is downloaded in G: drive - setup of ollama is done in G: drive - by default the CPU is being used despite having an Nvidia GeForce GTX 960 Graphics adapter Below is the server side error: ``` time=2025-07-09T14:23:42.280+05:30 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" time=2025-07-09T14:23:42.639+05:30 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server error" time=2025-07-09T14:23:42.682+05:30 level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2" time=2025-07-09T14:23:42.890+05:30 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: error:fault" [GIN] 2025/07/09 - 14:23:42 | 500 | 4.7371714s | 127.0.0.1 | POST "/api/generate" time=2025-07-09T14:23:47.925+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.0340531 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed time=2025-07-09T14:23:48.175+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.2844668 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed time=2025-07-09T14:23:48.425+05:30 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.5343092 runner.size="11.9 GiB" runner.vram="1.4 GiB" runner.parallel=1 runner.pid=12052 runner.model=G:\Manish\ollama\blobs\sha256-9999d473417a8e179d993498195be5f42cab963acc75f4a6b15d981e8b68abed ```

GiteaMirror commented

2026-04-12 15:46:16 -05:00

@dhiltgen commented on GitHub (Jul 17, 2025):

@ManishKhullar your scenario looks unrelated to this issue. I believe your GPU has only 2G of VRAM, however llama3.2-vision requires at least 6G to load on the GPU, so running on CPU is expected. Fully loaded the model takes ~12G of memory. You didn't include enough logs to see how much system memory you have, but it's possible your system may be under memory pressure. Take a look at Task Manager's memory view to see if you have enough space to fit this model, or try loading a smaller model. If that doesn't help, please open a new issue, and include more complete server logs with debug turned on so we can help troubleshoot.

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

@dhiltgen commented on GitHub (Jul 17, 2025): @ManishKhullar your scenario looks unrelated to this issue. I believe your GPU has only 2G of VRAM, however llama3.2-vision requires at least 6G to load on the GPU, so running on CPU is expected. Fully loaded the model takes ~12G of memory. You didn't include enough logs to see how much system memory you have, but it's possible your system may be under memory pressure. Take a look at Task Manager's memory view to see if you have enough space to fit this model, or try loading a smaller model. If that doesn't help, please open a new issue, and include more complete server logs with debug turned on so we can help troubleshoot. https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

GiteaMirror commented

2026-04-12 15:46:17 -05:00

@ManishKhullar commented on GitHub (Aug 4, 2025):

Hi
Thank you for the heads up.
I have 16 GB of DIMM RAM of which 12.2 is available while the browser and task manager is running.
Could you guide me where could I see the logs generated as I have added OLLAMA_DEBUG=true in user variables and am new to the ecosystem :)

@ManishKhullar commented on GitHub (Aug 4, 2025): Hi Thank you for the heads up. I have 16 GB of DIMM RAM of which 12.2 is available while the browser and task manager is running. Could you guide me where could I see the logs generated as I have added OLLAMA_DEBUG=true in user variables and am new to the ecosystem :)

GiteaMirror referenced this issue

2026-04-22 07:19:09 -05:00

[GH-ISSUE #4797] Stop token behavior changes when specifying list of stop tokens #28786

GiteaMirror referenced this issue

2026-04-28 12:08:06 -05:00

[GH-ISSUE #4797] Stop token behavior changes when specifying list of stop tokens #49537

GiteaMirror referenced this issue

2026-05-03 19:40:26 -05:00

[GH-ISSUE #4797] Stop token behavior changes when specifying list of stop tokens #65063

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#4797