[GH-ISSUE #9507] Ollama llm server error after upgrading to latest ollama client 0.5.13 #6195

Open
opened 2026-04-12 17:34:36 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @cubricmms on GitHub (Mar 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9507

What is the issue?

Error messages:

% ollama -v                                                                                        

Warning: could not connect to a running Ollama instance
Warning: client version is 0.5.13
% ollama list                                                                                    

Error: something went wrong, please see the ollama server logs for details

Relevant log output

server logs marks llm server error

2025/03/05 11:07:04 routes.go:1215: INFO server config env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/cjing/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-03-05T11:07:04.568+08:00 level=INFO source=images.go:432 msg="total blobs: 9"
time=2025-03-05T11:07:04.568+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-05T11:07:04.568+08:00 level=INFO source=routes.go:1277 msg="Listening on [::]:11434 (version 0.5.13)"
time=2025-03-05T11:07:04.626+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="16.0 GiB" available="16.0 GiB"
time=2025-03-05T11:07:50.027+08:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-05T11:07:50.027+08:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-05T11:07:50.027+08:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/cjing/.ollama/models/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e gpu=0 parallel=4 available=17179885568 required="10.8 GiB"
time=2025-03-05T11:07:50.032+08:00 level=INFO source=server.go:97 msg="system memory" total="24.0 GiB" free="8.7 GiB" free_swap="0 B"
time=2025-03-05T11:07:50.032+08:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-03-05T11:07:50.032+08:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-03-05T11:07:50.032+08:00 level=INFO source=server.go:130 msg=offload library=metal layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="10.8 GiB" memory.required.kv="1.5 GiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.9 GiB" memory.weights.repeating="8.3 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="676.0 MiB" memory.graph.partial="676.0 MiB"
time=2025-03-05T11:07:50.036+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/cjing/.ollama/models/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 8 --parallel 4 --port 49319"
time=2025-03-05T11:07:50.037+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-05T11:07:50.037+08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-05T11:07:50.038+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error

OS

Darwin MacBook-Pro.local 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:22 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6041 arm64 arm Darwin

GPU

Apple M4 Pro

CPU

Apple M4 Pro

Ollama version

client 0.5.13

Originally created by @cubricmms on GitHub (Mar 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9507 ### What is the issue? Error messages: ``` % ollama -v Warning: could not connect to a running Ollama instance Warning: client version is 0.5.13 ``` ``` % ollama list Error: something went wrong, please see the ollama server logs for details ``` ### Relevant log output server logs marks `llm server error` ```shell 2025/03/05 11:07:04 routes.go:1215: INFO server config env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/cjing/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2025-03-05T11:07:04.568+08:00 level=INFO source=images.go:432 msg="total blobs: 9" time=2025-03-05T11:07:04.568+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-05T11:07:04.568+08:00 level=INFO source=routes.go:1277 msg="Listening on [::]:11434 (version 0.5.13)" time=2025-03-05T11:07:04.626+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="16.0 GiB" available="16.0 GiB" time=2025-03-05T11:07:50.027+08:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-03-05T11:07:50.027+08:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-03-05T11:07:50.027+08:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/cjing/.ollama/models/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e gpu=0 parallel=4 available=17179885568 required="10.8 GiB" time=2025-03-05T11:07:50.032+08:00 level=INFO source=server.go:97 msg="system memory" total="24.0 GiB" free="8.7 GiB" free_swap="0 B" time=2025-03-05T11:07:50.032+08:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-03-05T11:07:50.032+08:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-03-05T11:07:50.032+08:00 level=INFO source=server.go:130 msg=offload library=metal layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="10.8 GiB" memory.required.kv="1.5 GiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.9 GiB" memory.weights.repeating="8.3 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="676.0 MiB" memory.graph.partial="676.0 MiB" time=2025-03-05T11:07:50.036+08:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --model /Users/cjing/.ollama/models/blobs/sha256-6e9f90f02bb3b39b59e81916e8cfce9deb45aeaeb9a54a5be4414486b907dc1e --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 8 --parallel 4 --port 49319" time=2025-03-05T11:07:50.037+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-05T11:07:50.037+08:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-03-05T11:07:50.038+08:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error ``` ### OS Darwin MacBook-Pro.local 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:22 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6041 arm64 arm Darwin ### GPU Apple M4 Pro ### CPU Apple M4 Pro ### Ollama version client 0.5.13
GiteaMirror added the bug label 2026-04-12 17:34:36 -05:00
Author
Owner

@jmorganca commented on GitHub (Mar 5, 2025):

Hi there. It seems the logs were truncated before the error message - do you have the full logs? Sorry this happened.

<!-- gh-comment-id:2699849153 --> @jmorganca commented on GitHub (Mar 5, 2025): Hi there. It seems the logs were truncated before the error message - do you have the full logs? Sorry this happened.
Author
Owner

@cubricmms commented on GitHub (Mar 6, 2025):

hi @jmorganca thanks for looking into it. I have solve the connection issue; but still I believe the problem persists when following the thread https://github.com/ollama/ollama/issues/703#issuecomment-1962853570 and set OLLAMA_HOST using launchctl. Seems like the client is not respecting the env set here

<!-- gh-comment-id:2702873891 --> @cubricmms commented on GitHub (Mar 6, 2025): hi @jmorganca thanks for looking into it. I have solve the connection issue; but still I believe the problem persists when following the thread https://github.com/ollama/ollama/issues/703#issuecomment-1962853570 and set `OLLAMA_HOST` using `launchctl`. Seems like the client is not respecting the env set here
Author
Owner

@zhiyzheng commented on GitHub (Apr 21, 2025):

@cubricmms how did you solve it? I encounter the same issue, now it troubles me

<!-- gh-comment-id:2817722234 --> @zhiyzheng commented on GitHub (Apr 21, 2025): @cubricmms how did you solve it? I encounter the same issue, now it troubles me
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6195