[GH-ISSUE #10119] Ollama hangs while generating a response #53151

Closed
opened 2026-04-29 02:07:35 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @jefferson-vm on GitHub (Apr 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10119

What is the issue?

Models tested with: Llama3.1:8b, Qwen2.5:14b, Gemma3:12b

Issue: Ollama repeatedly stops generating a response midway through agent execution causing a hanged state. I've been experiencing this very often after switching to 0.6.x. The number of tokens in input + output is lesser than set ctx_num. Once the time alive period runs out, the model is stuck at 'stopping' state.

Relevant log output

Apr 03 22:48:12 ip-xxx-xx-xx-xxx ollama[743374]: [GIN] 2025/04/03 - 22:48:12 | 200 |     120.452µs |       127.0.0.1 | GET      "/api/ps"
Apr 03 22:48:12 ip-xxx-xx-xx-xxx ollama[743374]: [GIN] 2025/04/03 - 22:48:12 | 200 |      28.131µs |       127.0.0.1 | HEAD     "/"
Apr 03 22:44:37 ip-xxx-xx-xx-xxx ollama[743374]: [GIN] 2025/04/03 - 22:44:37 | 200 |  5.054276458s |       127.0.0.1 | POST     "/api/chat"
Apr 03 22:44:32 ip-xxx-xx-xx-xxx ollama[743374]: [GIN] 2025/04/03 - 22:44:32 | 200 | 22.029005923s |       127.0.0.1 | POST     "/api/chat"
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: time=2025-04-03T22:44:16.649Z level=INFO source=server.go:619 msg="llama runner started in 2.01 seconds"
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: graph splits = 2
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: graph nodes  = 1030
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model:  CUDA_Host compute buffer size =   168.01 MiB
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model:      CUDA0 compute buffer size =  5312.00 MiB
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model:  CUDA_Host  output buffer size =     2.02 MiB
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: KV self size  = 10240.00 MiB, K (f16): 5120.00 MiB, V (f16): 5120.00 MiB
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_kv_cache_init:      CUDA0 KV buffer size = 10240.00 MiB
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_kv_cache_init: kv_size = 81920, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_ctx_per_seq (20480) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: freq_scale    = 1
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: freq_base     = 500000.0
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: flash_attn    = 0
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_ubatch      = 512
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_batch       = 2048
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_ctx_per_seq = 20480
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_ctx         = 81920
Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_seq_max     = 4
Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors:   CPU_Mapped model buffer size =   281.81 MiB
Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors:        CUDA0 model buffer size =  4403.49 MiB
Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: offloaded 33/33 layers to GPU
Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: offloading output layer to GPU
Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: offloading 32 repeating layers to GPU
Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: loading model tensors, this can take a while... (mmap = true)

OS

Linux

GPU

Nvidia A10G 24g x 4

CPU

AMD

Ollama version

0.6.3

Originally created by @jefferson-vm on GitHub (Apr 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10119 ### What is the issue? Models tested with: Llama3.1:8b, Qwen2.5:14b, Gemma3:12b Issue: Ollama repeatedly stops generating a response midway through agent execution causing a hanged state. I've been experiencing this very often after switching to 0.6.x. The number of tokens in input + output is lesser than set ctx_num. Once the time alive period runs out, the model is stuck at 'stopping' state. ### Relevant log output ```shell Apr 03 22:48:12 ip-xxx-xx-xx-xxx ollama[743374]: [GIN] 2025/04/03 - 22:48:12 | 200 | 120.452µs | 127.0.0.1 | GET "/api/ps" Apr 03 22:48:12 ip-xxx-xx-xx-xxx ollama[743374]: [GIN] 2025/04/03 - 22:48:12 | 200 | 28.131µs | 127.0.0.1 | HEAD "/" Apr 03 22:44:37 ip-xxx-xx-xx-xxx ollama[743374]: [GIN] 2025/04/03 - 22:44:37 | 200 | 5.054276458s | 127.0.0.1 | POST "/api/chat" Apr 03 22:44:32 ip-xxx-xx-xx-xxx ollama[743374]: [GIN] 2025/04/03 - 22:44:32 | 200 | 22.029005923s | 127.0.0.1 | POST "/api/chat" Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: time=2025-04-03T22:44:16.649Z level=INFO source=server.go:619 msg="llama runner started in 2.01 seconds" Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: graph splits = 2 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: graph nodes = 1030 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: CUDA_Host compute buffer size = 168.01 MiB Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: CUDA0 compute buffer size = 5312.00 MiB Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: CUDA_Host output buffer size = 2.02 MiB Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: KV self size = 10240.00 MiB, K (f16): 5120.00 MiB, V (f16): 5120.00 MiB Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_kv_cache_init: CUDA0 KV buffer size = 10240.00 MiB Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_kv_cache_init: kv_size = 81920, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_ctx_per_seq (20480) < n_ctx_train (131072) -- the full capacity of the model will not be utilized Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: freq_scale = 1 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: freq_base = 500000.0 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: flash_attn = 0 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_ubatch = 512 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_batch = 2048 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_ctx_per_seq = 20480 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_ctx = 81920 Apr 03 22:44:16 ip-xxx-xx-xx-xxx ollama[743374]: llama_init_from_model: n_seq_max = 4 Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: CPU_Mapped model buffer size = 281.81 MiB Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: CUDA0 model buffer size = 4403.49 MiB Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: offloaded 33/33 layers to GPU Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: offloading output layer to GPU Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: offloading 32 repeating layers to GPU Apr 03 22:44:15 ip-xxx-xx-xx-xxx ollama[743374]: load_tensors: loading model tensors, this can take a while... (mmap = true) ``` ### OS Linux ### GPU Nvidia A10G 24g x 4 ### CPU AMD ### Ollama version 0.6.3
GiteaMirror added the bug label 2026-04-29 02:07:35 -05:00
Author
Owner

@luisgg98 commented on GitHub (Apr 7, 2025):

Good afternoon I am experimenting the same issue.

Image

OS
Ubuntu 22.04.5 LTS
GPU
NVIDIA H100 PCIe 80g x 4
Ollama version
0.5.7

<!-- gh-comment-id:2783331390 --> @luisgg98 commented on GitHub (Apr 7, 2025): Good afternoon I am experimenting the same issue. ![Image](https://github.com/user-attachments/assets/54fe6653-3a8e-4220-a36f-d9d330ea660c) **OS** Ubuntu 22.04.5 LTS **GPU** NVIDIA H100 PCIe 80g x 4 **Ollama version** 0.5.7
Author
Owner

@dhiltgen commented on GitHub (Apr 9, 2025):

There seems to be a race somewhere in the scheduler under heavy load, possibly related to clients closing connections prematurely. If people are still seeing models get stuck in a "Stopping..." state in the ollama ps output and the model never actually unloads, please try running the server with OLLAMA_DEBUG=1 and share the logs including the model load, and eventual stuck state.

<!-- gh-comment-id:2791070598 --> @dhiltgen commented on GitHub (Apr 9, 2025): There seems to be a race somewhere in the scheduler under heavy load, possibly related to clients closing connections prematurely. If people are still seeing models get stuck in a "Stopping..." state in the `ollama ps` output and the model never actually unloads, please try running the server with OLLAMA_DEBUG=1 and share the logs including the model load, and eventual stuck state.
Author
Owner

@jefferson-vm commented on GitHub (Apr 10, 2025):

@dhiltgen here's the logs, let me know if this helps identify the issue:


Apr 10 19:36:18 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:18 | 200 |      22.041µs |       127.0.0.1 | HEAD     "/"

Apr 10 19:36:16 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:16 | 200 |       35.91µs |       127.0.0.1 | GET      "/api/ps"

Apr 10 19:36:16 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:16 | 200 |       21.43µs |       127.0.0.1 | HEAD     "/"

Apr 10 19:36:14 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:14 | 200 |      53.451µs |       127.0.0.1 | GET      "/api/ps"

Apr 10 19:36:14 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:14 | 200 |      22.841µs |       127.0.0.1 | HEAD     "/"

Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:54.155Z level=DEBUG source=sched.go:310 msg="ignoring unload event with no pending requests"

Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:54.155Z level=DEBUG source=sched.go:386 msg="sending an unloaded event" modelPath=/opt/ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa

Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:54.155Z level=DEBUG source=sched.go:661 msg="gpu VRAM free memory converged after 2.34 seconds" model=/opt/ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa

Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: releasing cuda driver library

Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:54.155Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-f0ec61ab-a500-6a59-4aa1-0a960a525f15 name="NVIDIA A10G" overhead="0 B" before.total="22.0 GiB" before.free="21.7 GiB" now.total="22.0 GiB" now.free="21.7 GiB" now.used="256.4 MiB"

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.952Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-533622a7-7b70-ef13-089d-ae37f430b06d name="NVIDIA A10G" overhead="0 B" before.total="22.0 GiB" before.free="15.5 GiB" now.total="22.0 GiB" now.free="20.6 GiB" now.used="1.3 GiB"

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.743Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-47fa09ae-cc7e-eb60-bb7b-b888d94accf4 name="NVIDIA A10G" overhead="0 B" before.total="22.0 GiB" before.free="19.9 GiB" now.total="22.0 GiB" now.free="19.9 GiB" now.used="2.1 GiB"

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.535Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-f4e38c22-86fb-3e62-0e31-d64c6e167a07 name="NVIDIA A10G" overhead="0 B" before.total="22.0 GiB" before.free="9.1 GiB" now.total="22.0 GiB" now.free="9.1 GiB" now.used="12.8 GiB"

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: device count 4

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: calling cuDeviceGetCount

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: CUDA driver version: 12.4

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: raw version 0x2f08

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: calling cuDriverGetVersion

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: calling cuInit

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuCtxDestroy - 0x774f7c8e1850

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuMemGetInfo_v2 - 0x774f7c886e20

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuCtxCreate_v3 - 0x774f7c87cee0

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGetName - 0x774f7c87cc40

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGetUuid - 0x774f7c87cc60

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGetAttribute - 0x774f7c87cd00

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGet - 0x774f7c87cc00

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGetCount - 0x774f7c87cc20

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDriverGetVersion - 0x774f7c87cbe0

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuInit - 0x774f7c87cbc0

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.113Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="373.7 GiB" before.free="347.4 GiB" before.free_swap="0 B" now.total="373.7 GiB" now.free="347.7 GiB" now.free_swap="0 B"

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.030Z level=DEBUG source=sched.go:382 msg="runner released" modelPath=/opt/ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa

Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.030Z level=DEBUG source=server.go:1011 msg="llama server stopped"

Apr 10 19:35:52 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:52.907Z level=DEBUG source=server.go:1007 msg="waiting for llama server to exit"

Apr 10 19:35:52 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:52.906Z level=DEBUG source=server.go:1001 msg="stopping llama server"

Apr 10 19:35:52 ip-172-31-91-221 ollama[829868]: releasing cuda driver library```
<!-- gh-comment-id:2794984675 --> @jefferson-vm commented on GitHub (Apr 10, 2025): @dhiltgen here's the logs, let me know if this helps identify the issue: ```Apr 10 19:36:18 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:18 | 200 | 36.841µs | 127.0.0.1 | GET "/api/ps" Apr 10 19:36:18 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:18 | 200 | 22.041µs | 127.0.0.1 | HEAD "/" Apr 10 19:36:16 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:16 | 200 | 35.91µs | 127.0.0.1 | GET "/api/ps" Apr 10 19:36:16 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:16 | 200 | 21.43µs | 127.0.0.1 | HEAD "/" Apr 10 19:36:14 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:14 | 200 | 53.451µs | 127.0.0.1 | GET "/api/ps" Apr 10 19:36:14 ip-172-31-91-221 ollama[829868]: [GIN] 2025/04/10 - 19:36:14 | 200 | 22.841µs | 127.0.0.1 | HEAD "/" Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:54.155Z level=DEBUG source=sched.go:310 msg="ignoring unload event with no pending requests" Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:54.155Z level=DEBUG source=sched.go:386 msg="sending an unloaded event" modelPath=/opt/ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:54.155Z level=DEBUG source=sched.go:661 msg="gpu VRAM free memory converged after 2.34 seconds" model=/opt/ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: releasing cuda driver library Apr 10 19:35:54 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:54.155Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-f0ec61ab-a500-6a59-4aa1-0a960a525f15 name="NVIDIA A10G" overhead="0 B" before.total="22.0 GiB" before.free="21.7 GiB" now.total="22.0 GiB" now.free="21.7 GiB" now.used="256.4 MiB" Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.952Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-533622a7-7b70-ef13-089d-ae37f430b06d name="NVIDIA A10G" overhead="0 B" before.total="22.0 GiB" before.free="15.5 GiB" now.total="22.0 GiB" now.free="20.6 GiB" now.used="1.3 GiB" Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.743Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-47fa09ae-cc7e-eb60-bb7b-b888d94accf4 name="NVIDIA A10G" overhead="0 B" before.total="22.0 GiB" before.free="19.9 GiB" now.total="22.0 GiB" now.free="19.9 GiB" now.used="2.1 GiB" Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.535Z level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-f4e38c22-86fb-3e62-0e31-d64c6e167a07 name="NVIDIA A10G" overhead="0 B" before.total="22.0 GiB" before.free="9.1 GiB" now.total="22.0 GiB" now.free="9.1 GiB" now.used="12.8 GiB" Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: device count 4 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: calling cuDeviceGetCount Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: CUDA driver version: 12.4 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: raw version 0x2f08 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: calling cuDriverGetVersion Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: calling cuInit Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuCtxDestroy - 0x774f7c8e1850 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuMemGetInfo_v2 - 0x774f7c886e20 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuCtxCreate_v3 - 0x774f7c87cee0 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGetName - 0x774f7c87cc40 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGetUuid - 0x774f7c87cc60 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGetAttribute - 0x774f7c87cd00 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGet - 0x774f7c87cc00 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDeviceGetCount - 0x774f7c87cc20 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuDriverGetVersion - 0x774f7c87cbe0 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: dlsym: cuInit - 0x774f7c87cbc0 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.550.120 Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.113Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="373.7 GiB" before.free="347.4 GiB" before.free_swap="0 B" now.total="373.7 GiB" now.free="347.7 GiB" now.free_swap="0 B" Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.030Z level=DEBUG source=sched.go:382 msg="runner released" modelPath=/opt/ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa Apr 10 19:35:53 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:53.030Z level=DEBUG source=server.go:1011 msg="llama server stopped" Apr 10 19:35:52 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:52.907Z level=DEBUG source=server.go:1007 msg="waiting for llama server to exit" Apr 10 19:35:52 ip-172-31-91-221 ollama[829868]: time=2025-04-10T19:35:52.906Z level=DEBUG source=server.go:1001 msg="stopping llama server" Apr 10 19:35:52 ip-172-31-91-221 ollama[829868]: releasing cuda driver library```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53151