[GH-ISSUE #1991] Error: Post "http://127.0.0.1:11434/api/generate": EOF #1145

Closed
opened 2026-04-12 10:53:52 -05:00 by GiteaMirror · 21 comments
Owner

Originally created by @joesalvati68 on GitHub (Jan 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1991

Originally assigned to: @dhiltgen on GitHub.

(base) user@userAlienware:$ ollama run vicuna
Error: Post "http://127.0.0.1:11434/api/generate": EOF
(base) user@userAlienware:
$

I keep getting this after initial install and I can't figure out why. Any ideas?

Originally created by @joesalvati68 on GitHub (Jan 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1991 Originally assigned to: @dhiltgen on GitHub. (base) user@userAlienware:~$ ollama run vicuna Error: Post "http://127.0.0.1:11434/api/generate": EOF (base) user@userAlienware:~$ I keep getting this after initial install and I can't figure out why. Any ideas?
GiteaMirror added the bug label 2026-04-12 10:53:52 -05:00
Author
Owner

@jmorganca commented on GitHub (Jan 14, 2024):

Hi @joesalvati68 sorry you hit this. Is this on WSL2? Would it be possible to share the logs and/or error potential CUDA error you're seeing in there?

journalctl -u ollama

Thanks so much

<!-- gh-comment-id:1891044977 --> @jmorganca commented on GitHub (Jan 14, 2024): Hi @joesalvati68 sorry you hit this. Is this on WSL2? Would it be possible to share the logs and/or error potential `CUDA` error you're seeing in there? ``` journalctl -u ollama ``` Thanks so much
Author
Owner

@musiaht commented on GitHub (Jan 14, 2024):

I am seeing the same thing when running mistral. I am using Ubuntu 22.04.3

This is the output from my journalctl -u ollama

Jan 14 12:15:11 hostname ollama[13665]: [GIN] 2024/01/14 - 12:15:11 | 404 |      87.897µs |       127.0.0.1 | POST     "/api/show"
Jan 14 12:15:16 hostname ollama[13665]: 2024/01/14 12:15:16 download.go:123: downloading e8a35b5937a5 in 42 100 MB part(s)
Jan 14 12:17:20 hostname ollama[13665]: [GIN] 2024/01/14 - 12:17:20 | 200 |      20.339µs |       127.0.0.1 | GET      "/"
Jan 14 12:17:20 hostname ollama[13665]: [GIN] 2024/01/14 - 12:17:20 | 404 |        2.28µs |       127.0.0.1 | GET      "/favicon.ico"
Jan 14 12:17:34 hostname ollama[13665]: [GIN] 2024/01/14 - 12:17:34 | 200 |        7.25µs |       127.0.0.1 | GET      "/"
Jan 14 12:17:39 hostname ollama[13665]: [GIN] 2024/01/14 - 12:17:39 | 404 |        2.87µs |       127.0.0.1 | GET      "/api/show"
Jan 14 12:18:25 hostname ollama[13665]: 2024/01/14 12:18:25 download.go:123: downloading 43070e2d4e53 in 1 11 KB part(s)
Jan 14 12:18:28 hostname ollama[13665]: 2024/01/14 12:18:28 download.go:123: downloading e6836092461f in 1 42 B part(s)
Jan 14 12:18:33 hostname ollama[13665]: 2024/01/14 12:18:33 download.go:123: downloading ed11eda7790d in 1 30 B part(s)
Jan 14 12:18:35 hostname ollama[13665]: 2024/01/14 12:18:35 download.go:123: downloading f9b1e3196ecf in 1 483 B part(s)
Jan 14 12:18:39 hostname ollama[13665]: [GIN] 2024/01/14 - 12:18:39 | 200 |         3m27s |       127.0.0.1 | POST     "/api/pull"
Jan 14 12:18:39 hostname ollama[13665]: [GIN] 2024/01/14 - 12:18:39 | 200 |     371.368µs |       127.0.0.1 | POST     "/api/show"
Jan 14 12:18:39 hostname ollama[13665]: 2024/01/14 12:18:39 shim_ext_server_linux.go:24: Updating PATH to /home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games>
Jan 14 12:18:39 hostname ollama[13665]: 2024/01/14 12:18:39 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama3605392192/rocm/libext_server.so
Jan 14 12:18:39 hostname ollama[13665]: 2024/01/14 12:18:39 ext_server_common.go:136: Initializing internal llama server
Jan 14 12:18:39 hostname ollama[13665]: free(): invalid pointer
Jan 14 12:18:39 hostname systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT
Jan 14 12:18:39 hostname systemd[1]: ollama.service: Failed with result 'core-dump'.
Jan 14 12:18:39 hostname systemd[1]: ollama.service: Consumed 25.138s CPU time.
Jan 14 12:18:42 hostname systemd[1]: ollama.service: Scheduled restart job, restart counter is at 1.
Jan 14 12:18:42 hostname systemd[1]: Stopped Ollama Service.
Jan 14 12:18:42 hostname systemd[1]: ollama.service: Consumed 25.138s CPU time.
Jan 14 12:18:42 hostname systemd[1]: Started Ollama Service.
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 images.go:808: total blobs: 5
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 images.go:815: total unused blobs removed: 0
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20)
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:88: Detecting GPU type
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:203: Searching for GPU management library libnvidia-ml.so
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:248: Discovered GPU libraries: []
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:203: Searching for GPU management library librocm_smi64.so
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:248: Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50702 /opt/rocm-5.7.2/lib/librocm_smi64.so.5.0.50702]
Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:104: Radeon GPU detected
Jan 14 12:24:32 hostname ollama[13810]: [GIN] 2024/01/14 - 12:24:32 | 200 |      29.939µs |       127.0.0.1 | HEAD     "/"
Jan 14 12:24:32 hostname ollama[13810]: [GIN] 2024/01/14 - 12:24:32 | 200 |     348.788µs |       127.0.0.1 | POST     "/api/show"
Jan 14 12:24:32 hostname ollama[13810]: [GIN] 2024/01/14 - 12:24:32 | 200 |     942.635µs |       127.0.0.1 | POST     "/api/show"
Jan 14 12:24:33 hostname ollama[13810]: 2024/01/14 12:24:33 shim_ext_server_linux.go:24: Updating PATH to /home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games>
Jan 14 12:24:33 hostname ollama[13810]: 2024/01/14 12:24:33 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama2966675158/rocm/libext_server.so
Jan 14 12:24:33 hostname ollama[13810]: 2024/01/14 12:24:33 ext_server_common.go:136: Initializing internal llama server
Jan 14 12:24:33 hostname ollama[13810]: free(): invalid pointer
Jan 14 12:24:33 hostname systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT
Jan 14 12:24:33 hostname systemd[1]: ollama.service: Failed with result 'core-dump'.
Jan 14 12:24:36 hostname systemd[1]: ollama.service: Scheduled restart job, restart counter is at 2.
Jan 14 12:24:36 hostname systemd[1]: Stopped Ollama Service.
Jan 14 12:24:36 hostname systemd[1]: Started Ollama Service.
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 images.go:808: total blobs: 5
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 images.go:815: total unused blobs removed: 0
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20)
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:88: Detecting GPU type
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:203: Searching for GPU management library libnvidia-ml.so
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:248: Discovered GPU libraries: []
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:203: Searching for GPU management library librocm_smi64.so
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:248: Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50702 /opt/rocm-5.7.2/lib/librocm_smi64.so.5.0.50702]
Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:104: Radeon GPU detected

Looks like it ran into a free(): invalid pointer.

<!-- gh-comment-id:1891065285 --> @musiaht commented on GitHub (Jan 14, 2024): I am seeing the same thing when running mistral. I am using Ubuntu 22.04.3 This is the output from my `journalctl -u ollama` ``` Jan 14 12:15:11 hostname ollama[13665]: [GIN] 2024/01/14 - 12:15:11 | 404 | 87.897µs | 127.0.0.1 | POST "/api/show" Jan 14 12:15:16 hostname ollama[13665]: 2024/01/14 12:15:16 download.go:123: downloading e8a35b5937a5 in 42 100 MB part(s) Jan 14 12:17:20 hostname ollama[13665]: [GIN] 2024/01/14 - 12:17:20 | 200 | 20.339µs | 127.0.0.1 | GET "/" Jan 14 12:17:20 hostname ollama[13665]: [GIN] 2024/01/14 - 12:17:20 | 404 | 2.28µs | 127.0.0.1 | GET "/favicon.ico" Jan 14 12:17:34 hostname ollama[13665]: [GIN] 2024/01/14 - 12:17:34 | 200 | 7.25µs | 127.0.0.1 | GET "/" Jan 14 12:17:39 hostname ollama[13665]: [GIN] 2024/01/14 - 12:17:39 | 404 | 2.87µs | 127.0.0.1 | GET "/api/show" Jan 14 12:18:25 hostname ollama[13665]: 2024/01/14 12:18:25 download.go:123: downloading 43070e2d4e53 in 1 11 KB part(s) Jan 14 12:18:28 hostname ollama[13665]: 2024/01/14 12:18:28 download.go:123: downloading e6836092461f in 1 42 B part(s) Jan 14 12:18:33 hostname ollama[13665]: 2024/01/14 12:18:33 download.go:123: downloading ed11eda7790d in 1 30 B part(s) Jan 14 12:18:35 hostname ollama[13665]: 2024/01/14 12:18:35 download.go:123: downloading f9b1e3196ecf in 1 483 B part(s) Jan 14 12:18:39 hostname ollama[13665]: [GIN] 2024/01/14 - 12:18:39 | 200 | 3m27s | 127.0.0.1 | POST "/api/pull" Jan 14 12:18:39 hostname ollama[13665]: [GIN] 2024/01/14 - 12:18:39 | 200 | 371.368µs | 127.0.0.1 | POST "/api/show" Jan 14 12:18:39 hostname ollama[13665]: 2024/01/14 12:18:39 shim_ext_server_linux.go:24: Updating PATH to /home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games> Jan 14 12:18:39 hostname ollama[13665]: 2024/01/14 12:18:39 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama3605392192/rocm/libext_server.so Jan 14 12:18:39 hostname ollama[13665]: 2024/01/14 12:18:39 ext_server_common.go:136: Initializing internal llama server Jan 14 12:18:39 hostname ollama[13665]: free(): invalid pointer Jan 14 12:18:39 hostname systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT Jan 14 12:18:39 hostname systemd[1]: ollama.service: Failed with result 'core-dump'. Jan 14 12:18:39 hostname systemd[1]: ollama.service: Consumed 25.138s CPU time. Jan 14 12:18:42 hostname systemd[1]: ollama.service: Scheduled restart job, restart counter is at 1. Jan 14 12:18:42 hostname systemd[1]: Stopped Ollama Service. Jan 14 12:18:42 hostname systemd[1]: ollama.service: Consumed 25.138s CPU time. Jan 14 12:18:42 hostname systemd[1]: Started Ollama Service. Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 images.go:808: total blobs: 5 Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 images.go:815: total unused blobs removed: 0 Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20) Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:88: Detecting GPU type Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:203: Searching for GPU management library libnvidia-ml.so Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:248: Discovered GPU libraries: [] Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:203: Searching for GPU management library librocm_smi64.so Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:248: Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50702 /opt/rocm-5.7.2/lib/librocm_smi64.so.5.0.50702] Jan 14 12:18:42 hostname ollama[13810]: 2024/01/14 12:18:42 gpu.go:104: Radeon GPU detected Jan 14 12:24:32 hostname ollama[13810]: [GIN] 2024/01/14 - 12:24:32 | 200 | 29.939µs | 127.0.0.1 | HEAD "/" Jan 14 12:24:32 hostname ollama[13810]: [GIN] 2024/01/14 - 12:24:32 | 200 | 348.788µs | 127.0.0.1 | POST "/api/show" Jan 14 12:24:32 hostname ollama[13810]: [GIN] 2024/01/14 - 12:24:32 | 200 | 942.635µs | 127.0.0.1 | POST "/api/show" Jan 14 12:24:33 hostname ollama[13810]: 2024/01/14 12:24:33 shim_ext_server_linux.go:24: Updating PATH to /home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games> Jan 14 12:24:33 hostname ollama[13810]: 2024/01/14 12:24:33 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama2966675158/rocm/libext_server.so Jan 14 12:24:33 hostname ollama[13810]: 2024/01/14 12:24:33 ext_server_common.go:136: Initializing internal llama server Jan 14 12:24:33 hostname ollama[13810]: free(): invalid pointer Jan 14 12:24:33 hostname systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT Jan 14 12:24:33 hostname systemd[1]: ollama.service: Failed with result 'core-dump'. Jan 14 12:24:36 hostname systemd[1]: ollama.service: Scheduled restart job, restart counter is at 2. Jan 14 12:24:36 hostname systemd[1]: Stopped Ollama Service. Jan 14 12:24:36 hostname systemd[1]: Started Ollama Service. Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 images.go:808: total blobs: 5 Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 images.go:815: total unused blobs removed: 0 Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20) Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:88: Detecting GPU type Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:203: Searching for GPU management library libnvidia-ml.so Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:248: Discovered GPU libraries: [] Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:203: Searching for GPU management library librocm_smi64.so Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:248: Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50702 /opt/rocm-5.7.2/lib/librocm_smi64.so.5.0.50702] Jan 14 12:24:36 hostname ollama[14029]: 2024/01/14 12:24:36 gpu.go:104: Radeon GPU detected ``` Looks like it ran into a `free(): invalid pointer`.
Author
Owner

@ryukyi commented on GitHub (Jan 14, 2024):

Same with WSL2 ubuntu22.04 and definitely a memory issue. I had the same on llama2, llama2-uncensored and mistral although mistral I was able get responses to some queries that were short. As soon as I asked multiline or longer questions, the same memory issue happens. See below output from:

journalctl -u ollama

Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  281:          blk.31.attn_norm.weight f32      [  4096,     1,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  282:           blk.31.ffn_down.weight q4_0     [ 14336,  4096,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  283:           blk.31.ffn_gate.weight q4_0     [  4096, 14336,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  284:             blk.31.ffn_up.weight q4_0     [  4096, 14336,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  285:           blk.31.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  286:             blk.31.attn_k.weight q4_0     [  4096,  1024,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  287:        blk.31.attn_output.weight q4_0     [  4096,  4096,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  288:             blk.31.attn_q.weight q4_0     [  4096,  4096,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  289:             blk.31.attn_v.weight q4_0     [  4096,  1024,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor  290:               output_norm.weight f32      [  4096,     1,     1,     1 ]
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   1:                               general.name str              = mistralai
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   4:                          llama.block_count u32              = 32
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  11:                          general.file_type u32              = 2
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,58980]   = ["▁ t", "i n", "e r", "▁ a", "h e...
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv  23:               general.quantization_version u32              = 2
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - type  f32:   65 tensors
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - type q4_0:  225 tensors
Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - type q6_K:    1 tensors
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_vocab: special tokens definition check successful ( 259/32000 ).
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: format           = GGUF V3 (latest)
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: arch             = llama
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: vocab type       = SPM
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_vocab          = 32000
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_merges         = 0
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_ctx_train      = 32768
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_embd           = 4096
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_head           = 32
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_head_kv        = 8
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_layer          = 32
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_rot            = 128
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_gqa            = 4
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_ff             = 14336
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_expert         = 0
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_expert_used    = 0
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: rope scaling     = linear
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: freq_base_train  = 1000000.0
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: freq_scale_train = 1
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_yarn_orig_ctx  = 32768
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: rope_finetuned   = unknown
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: model type       = 7B
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: model ftype      = Q4_0
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: model params     = 7.24 B
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: model size       = 3.83 GiB (4.54 BPW)
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: general.name     = mistralai
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: BOS token        = 1 '<s>'
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: EOS token        = 2 '</s>'
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: UNK token        = 0 '<unk>'
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: ggml ctx size =    0.11 MiB
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: using CUDA for GPU acceleration
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: mem required  =  992.20 MiB
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: offloading 25 repeating layers to GPU
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: offloaded 25/33 layers to GPU
Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: VRAM used: 2925.78 MiB
Jan 15 08:49:57 axiknious ollama[32052]: ...................................................................................................
Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: n_ctx      = 2048
Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: freq_base  = 1000000.0
Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: freq_scale = 1
Jan 15 08:49:57 axiknious ollama[32052]: llama_kv_cache_init: VRAM kv self = 200.00 MB
Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
Jan 15 08:49:57 axiknious ollama[32052]: llama_build_graph: non-view tensors processed: 676/676
Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: compute buffer total size = 159.19 MiB
Jan 15 08:49:58 axiknious ollama[32052]: llama_new_context_with_model: VRAM scratch buffer: 156.00 MiB
Jan 15 08:49:58 axiknious ollama[32052]: llama_new_context_with_model: total VRAM used: 3281.79 MiB (model: 2925.78 MiB, context: 356.00 MiB)
Jan 15 08:49:58 axiknious ollama[32052]: 2024/01/15 08:49:58 ext_server_common.go:144: Starting internal llama main loop
Jan 15 08:49:58 axiknious ollama[32052]: [GIN] 2024/01/15 - 08:49:58 | 200 |  2.905588686s |       127.0.0.1 | POST     "/api/generate"
Jan 15 08:50:37 axiknious ollama[32052]: 2024/01/15 08:50:37 ext_server_common.go:158: loaded 0 images
Jan 15 08:50:37 axiknious ollama[32052]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: out of memory
Jan 15 08:50:37 axiknious ollama[32052]: current device: 0
Jan 15 08:50:37 axiknious ollama[32052]: Lazy loading /tmp/ollama3988857133/cuda/libext_server.so library
Jan 15 08:50:37 axiknious ollama[32052]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: !"CUDA error"
Jan 15 08:50:38 axiknious systemd[1]: ollama.service: Main process exited, code=killed, status=6/ABRT
Jan 15 08:50:38 axiknious systemd[1]: ollama.service: Failed with result 'signal'.
Jan 15 08:50:41 axiknious systemd[1]: ollama.service: Scheduled restart job, restart counter is at 7.
Jan 15 08:50:41 axiknious systemd[1]: Stopped Ollama Service.
Jan 15 08:50:41 axiknious systemd[1]: Started Ollama Service.
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 images.go:808: total blobs: 5
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 images.go:815: total unused blobs removed: 0
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20)
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:88: Detecting GPU type
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:203: Searching for GPU management library libnvidia-ml.so
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:248: Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.so.1]
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:94: Nvidia GPU detected
Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:135: CUDA Compute Capability detected: 7.5

Looks like it retries 7 times before stopping: Scheduled restart job, restart counter is at 7.

It fails a CUDA_CHECK of cuda_malloc:
328b83de23/ggml-cuda.cu (L6600)

static void * ggml_cuda_pool_malloc(size_t size, size_t * actual_size) {
    scoped_spin_lock lock(g_cuda_pool_lock);
    int id;
    CUDA_CHECK(cudaGetDevice(&id));
#ifdef DEBUG_CUDA_MALLOC
    int nnz = 0;
    size_t max_size = 0, tot_size = 0;
#endif
    size_t best_diff = 1ull << 36;
    int ibest = -1;
    for (int i = 0; i < MAX_CUDA_BUFFERS; ++i) {
        cuda_buffer& b = g_cuda_buffer_pool[id][i];
        if (b.ptr != nullptr) {
#ifdef DEBUG_CUDA_MALLOC
            ++nnz;
            tot_size += b.size;
            if (b.size > max_size) max_size = b.size;
#endif
            if (b.size >= size) {
                size_t diff = b.size - size;
                if (diff < best_diff) {
                    best_diff = diff;
                    ibest = i;
                    if (!best_diff) {
                        void * ptr = b.ptr;
                        *actual_size = b.size;
                        b.ptr = nullptr;
                        b.size = 0;
                        return ptr;
                    }
                }
            }
        }
    }
    if (ibest >= 0) {
        cuda_buffer& b = g_cuda_buffer_pool[id][ibest];
        void * ptr = b.ptr;
        *actual_size = b.size;
        b.ptr = nullptr;
        b.size = 0;
        return ptr;
    }
#ifdef DEBUG_CUDA_MALLOC
    fprintf(stderr, "%s: %d buffers, max_size = %u MB, tot_size = %u MB, requested %u MB\n", __func__, nnz,
            (uint32_t)(max_size/1024/1024), (uint32_t)(tot_size/1024/1024), (uint32_t)(size/1024/1024));
#endif
    void * ptr;
    size_t look_ahead_size = (size_t) (1.05 * size);
    look_ahead_size = 256 * ((look_ahead_size + 255)/256);
    CUDA_CHECK(cudaMalloc((void **) &ptr, look_ahead_size));
    *actual_size = look_ahead_size;
    return ptr;
}
<!-- gh-comment-id:1891085830 --> @ryukyi commented on GitHub (Jan 14, 2024): Same with WSL2 ubuntu22.04 and definitely a memory issue. I had the same on `llama2`, `llama2-uncensored` and `mistral` although `mistral` I was able get responses to some queries that were short. As soon as I asked multiline or longer questions, the same memory issue happens. See below output from: `journalctl -u ollama` ```bash Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 281: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 282: blk.31.ffn_down.weight q4_0 [ 14336, 4096, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 283: blk.31.ffn_gate.weight q4_0 [ 4096, 14336, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 284: blk.31.ffn_up.weight q4_0 [ 4096, 14336, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 285: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 286: blk.31.attn_k.weight q4_0 [ 4096, 1024, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 287: blk.31.attn_output.weight q4_0 [ 4096, 4096, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 288: blk.31.attn_q.weight q4_0 [ 4096, 4096, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 289: blk.31.attn_v.weight q4_0 [ 4096, 1024, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - tensor 290: output_norm.weight f32 [ 4096, 1, 1, 1 ] Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 0: general.architecture str = llama Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 1: general.name str = mistralai Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 2: llama.context_length u32 = 32768 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 4: llama.block_count u32 = 32 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 11: general.file_type u32 = 2 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 12: tokenizer.ggml.model str = llama Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,58980] = ["▁ t", "i n", "e r", "▁ a", "h e... Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess... Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - kv 23: general.quantization_version u32 = 2 Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - type f32: 65 tensors Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - type q4_0: 225 tensors Jan 15 08:49:56 axiknious ollama[32052]: llama_model_loader: - type q6_K: 1 tensors Jan 15 08:49:56 axiknious ollama[32052]: llm_load_vocab: special tokens definition check successful ( 259/32000 ). Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: format = GGUF V3 (latest) Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: arch = llama Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: vocab type = SPM Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_vocab = 32000 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_merges = 0 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_ctx_train = 32768 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_embd = 4096 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_head = 32 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_head_kv = 8 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_layer = 32 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_rot = 128 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_gqa = 4 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: f_norm_eps = 0.0e+00 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_ff = 14336 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_expert = 0 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_expert_used = 0 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: rope scaling = linear Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: freq_base_train = 1000000.0 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: freq_scale_train = 1 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: n_yarn_orig_ctx = 32768 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: rope_finetuned = unknown Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: model type = 7B Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: model ftype = Q4_0 Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: model params = 7.24 B Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: model size = 3.83 GiB (4.54 BPW) Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: general.name = mistralai Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: BOS token = 1 '<s>' Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: EOS token = 2 '</s>' Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: UNK token = 0 '<unk>' Jan 15 08:49:56 axiknious ollama[32052]: llm_load_print_meta: LF token = 13 '<0x0A>' Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: ggml ctx size = 0.11 MiB Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: using CUDA for GPU acceleration Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: mem required = 992.20 MiB Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: offloading 25 repeating layers to GPU Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: offloaded 25/33 layers to GPU Jan 15 08:49:56 axiknious ollama[32052]: llm_load_tensors: VRAM used: 2925.78 MiB Jan 15 08:49:57 axiknious ollama[32052]: ................................................................................................... Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: n_ctx = 2048 Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: freq_base = 1000000.0 Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: freq_scale = 1 Jan 15 08:49:57 axiknious ollama[32052]: llama_kv_cache_init: VRAM kv self = 200.00 MB Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB Jan 15 08:49:57 axiknious ollama[32052]: llama_build_graph: non-view tensors processed: 676/676 Jan 15 08:49:57 axiknious ollama[32052]: llama_new_context_with_model: compute buffer total size = 159.19 MiB Jan 15 08:49:58 axiknious ollama[32052]: llama_new_context_with_model: VRAM scratch buffer: 156.00 MiB Jan 15 08:49:58 axiknious ollama[32052]: llama_new_context_with_model: total VRAM used: 3281.79 MiB (model: 2925.78 MiB, context: 356.00 MiB) Jan 15 08:49:58 axiknious ollama[32052]: 2024/01/15 08:49:58 ext_server_common.go:144: Starting internal llama main loop Jan 15 08:49:58 axiknious ollama[32052]: [GIN] 2024/01/15 - 08:49:58 | 200 | 2.905588686s | 127.0.0.1 | POST "/api/generate" Jan 15 08:50:37 axiknious ollama[32052]: 2024/01/15 08:50:37 ext_server_common.go:158: loaded 0 images Jan 15 08:50:37 axiknious ollama[32052]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: out of memory Jan 15 08:50:37 axiknious ollama[32052]: current device: 0 Jan 15 08:50:37 axiknious ollama[32052]: Lazy loading /tmp/ollama3988857133/cuda/libext_server.so library Jan 15 08:50:37 axiknious ollama[32052]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: !"CUDA error" Jan 15 08:50:38 axiknious systemd[1]: ollama.service: Main process exited, code=killed, status=6/ABRT Jan 15 08:50:38 axiknious systemd[1]: ollama.service: Failed with result 'signal'. Jan 15 08:50:41 axiknious systemd[1]: ollama.service: Scheduled restart job, restart counter is at 7. Jan 15 08:50:41 axiknious systemd[1]: Stopped Ollama Service. Jan 15 08:50:41 axiknious systemd[1]: Started Ollama Service. Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 images.go:808: total blobs: 5 Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 images.go:815: total unused blobs removed: 0 Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20) Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:88: Detecting GPU type Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:203: Searching for GPU management library libnvidia-ml.so Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:248: Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.so.1] Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:94: Nvidia GPU detected Jan 15 08:50:41 axiknious ollama[35222]: 2024/01/15 08:50:41 gpu.go:135: CUDA Compute Capability detected: 7.5 ``` Looks like it retries 7 times before stopping: `Scheduled restart job, restart counter is at 7.` It fails a CUDA_CHECK of cuda_malloc: https://github.com/ggerganov/llama.cpp/blob/328b83de23b33240e28f4e74900d1d06726f5eb1/ggml-cuda.cu#L6600 ```cpp static void * ggml_cuda_pool_malloc(size_t size, size_t * actual_size) { scoped_spin_lock lock(g_cuda_pool_lock); int id; CUDA_CHECK(cudaGetDevice(&id)); #ifdef DEBUG_CUDA_MALLOC int nnz = 0; size_t max_size = 0, tot_size = 0; #endif size_t best_diff = 1ull << 36; int ibest = -1; for (int i = 0; i < MAX_CUDA_BUFFERS; ++i) { cuda_buffer& b = g_cuda_buffer_pool[id][i]; if (b.ptr != nullptr) { #ifdef DEBUG_CUDA_MALLOC ++nnz; tot_size += b.size; if (b.size > max_size) max_size = b.size; #endif if (b.size >= size) { size_t diff = b.size - size; if (diff < best_diff) { best_diff = diff; ibest = i; if (!best_diff) { void * ptr = b.ptr; *actual_size = b.size; b.ptr = nullptr; b.size = 0; return ptr; } } } } } if (ibest >= 0) { cuda_buffer& b = g_cuda_buffer_pool[id][ibest]; void * ptr = b.ptr; *actual_size = b.size; b.ptr = nullptr; b.size = 0; return ptr; } #ifdef DEBUG_CUDA_MALLOC fprintf(stderr, "%s: %d buffers, max_size = %u MB, tot_size = %u MB, requested %u MB\n", __func__, nnz, (uint32_t)(max_size/1024/1024), (uint32_t)(tot_size/1024/1024), (uint32_t)(size/1024/1024)); #endif void * ptr; size_t look_ahead_size = (size_t) (1.05 * size); look_ahead_size = 256 * ((look_ahead_size + 255)/256); CUDA_CHECK(cudaMalloc((void **) &ptr, look_ahead_size)); *actual_size = look_ahead_size; return ptr; } ```
Author
Owner

@ryukyi commented on GitHub (Jan 14, 2024):

Here is my hardware spec output from wsl2 ubuntu22.04 LTS distro using inxi -Fxz:

System:
  Kernel: 5.15.133.1-microsoft-standard-WSL2 x86_64 bits: 64 compiler: gcc
    v: 11.2.0 Desktop: N/A Distro: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Machine:
  Message: No machine data: try newer kernel. Is dmidecode installed? Try -M
  --dmidecode.
Battery:
  ID-1: BAT1 charge: 5.0 Wh (100.0%) condition: 5.0/5.0 Wh (100.0%)
    volts: 5.0 min: 5.0 model: Microsoft Hyper-V Virtual Batte status: Full
CPU:
  Info: 8-core model: Intel Core i9-10885H bits: 64 type: MT MCP
    arch: Comet Lake rev: 2 cache: L1: 512 KiB L2: 2 MiB L3: 16 MiB
  Speed (MHz): avg: 2400 min/max: N/A cores: 1: 2400 2: 2400 3: 2400
    4: 2400 5: 2400 6: 2400 7: 2400 8: 2400 9: 2400 10: 2400 11: 2400 12: 2400
    13: 2400 14: 2400 15: 2400 16: 2400 bogomips: 76800
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
Graphics:
  Device-1: Microsoft driver: dxgkrnl v: 2.0.2 bus-ID: 5d97:00:00.0
  Device-2: Microsoft driver: dxgkrnl v: 2.0.2 bus-ID: d22a:00:00.0
  Display: wayland server: Microsoft Corporation X.org driver:
    gpu: dxgkrnl,dxgkrnl resolution: 1: 1920x1200~60Hz 2: 1200x1920~60Hz
  OpenGL: renderer: D3D12 (Intel UHD Graphics)
    v: 4.1 Mesa 23.0.4-0ubuntu1~22.04.1 direct render: Yes
Audio:
  Message: No device data found.
Network:
  Message: No device data found.
  IF-ID-1: bonding_masters state: N/A speed: N/A duplex: N/A mac: N/A
  IF-ID-2: br-0878e49730b9 state: down mac: <filter>
  IF-ID-3: br-2a84e2b41a70 state: down mac: <filter>
  IF-ID-4: br-59a3148c9959 state: down mac: <filter>
  IF-ID-5: br-bf4688f96ff1 state: down mac: <filter>
  IF-ID-6: br-ddd37949f428 state: down mac: <filter>
  IF-ID-7: br-df4919d7e615 state: down mac: <filter>
  IF-ID-8: docker0 state: down mac: <filter>
  IF-ID-9: eth0 state: up speed: 10000 Mbps duplex: full mac: <filter>
Drives:
  Local Storage: total: 1.01 TiB used: 659.41 GiB (63.9%)
  ID-1: /dev/sda model: Virtual Disk size: 389.8 MiB
  ID-2: /dev/sdb model: Virtual Disk size: 8 GiB
  ID-3: /dev/sdc model: Virtual Disk size: 1024 GiB
Partition:
  ID-1: / size: 1006.85 GiB used: 48.61 GiB (4.8%) fs: ext4 dev: /dev/sdc
Swap:
  ID-1: swap-1 type: partition size: 8 GiB used: 0 KiB (0.0%) dev: /dev/sdb
Sensors:
  Message: No sensor data found. Is lm-sensors configured?
Info:
  Processes: 72 Uptime: 14h 43m Memory: 31.22 GiB used: 1.18 GiB (3.8%)
  Init: systemd runlevel: 5 Compilers: gcc: 11.4.0 Packages: 913 Shell: Zsh
  v: 5.8.1 inxi: 3.3.13
<!-- gh-comment-id:1891091615 --> @ryukyi commented on GitHub (Jan 14, 2024): Here is my hardware spec output from wsl2 ubuntu22.04 LTS distro using `inxi -Fxz`: ``` System: Kernel: 5.15.133.1-microsoft-standard-WSL2 x86_64 bits: 64 compiler: gcc v: 11.2.0 Desktop: N/A Distro: Ubuntu 22.04.3 LTS (Jammy Jellyfish) Machine: Message: No machine data: try newer kernel. Is dmidecode installed? Try -M --dmidecode. Battery: ID-1: BAT1 charge: 5.0 Wh (100.0%) condition: 5.0/5.0 Wh (100.0%) volts: 5.0 min: 5.0 model: Microsoft Hyper-V Virtual Batte status: Full CPU: Info: 8-core model: Intel Core i9-10885H bits: 64 type: MT MCP arch: Comet Lake rev: 2 cache: L1: 512 KiB L2: 2 MiB L3: 16 MiB Speed (MHz): avg: 2400 min/max: N/A cores: 1: 2400 2: 2400 3: 2400 4: 2400 5: 2400 6: 2400 7: 2400 8: 2400 9: 2400 10: 2400 11: 2400 12: 2400 13: 2400 14: 2400 15: 2400 16: 2400 bogomips: 76800 Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 Graphics: Device-1: Microsoft driver: dxgkrnl v: 2.0.2 bus-ID: 5d97:00:00.0 Device-2: Microsoft driver: dxgkrnl v: 2.0.2 bus-ID: d22a:00:00.0 Display: wayland server: Microsoft Corporation X.org driver: gpu: dxgkrnl,dxgkrnl resolution: 1: 1920x1200~60Hz 2: 1200x1920~60Hz OpenGL: renderer: D3D12 (Intel UHD Graphics) v: 4.1 Mesa 23.0.4-0ubuntu1~22.04.1 direct render: Yes Audio: Message: No device data found. Network: Message: No device data found. IF-ID-1: bonding_masters state: N/A speed: N/A duplex: N/A mac: N/A IF-ID-2: br-0878e49730b9 state: down mac: <filter> IF-ID-3: br-2a84e2b41a70 state: down mac: <filter> IF-ID-4: br-59a3148c9959 state: down mac: <filter> IF-ID-5: br-bf4688f96ff1 state: down mac: <filter> IF-ID-6: br-ddd37949f428 state: down mac: <filter> IF-ID-7: br-df4919d7e615 state: down mac: <filter> IF-ID-8: docker0 state: down mac: <filter> IF-ID-9: eth0 state: up speed: 10000 Mbps duplex: full mac: <filter> Drives: Local Storage: total: 1.01 TiB used: 659.41 GiB (63.9%) ID-1: /dev/sda model: Virtual Disk size: 389.8 MiB ID-2: /dev/sdb model: Virtual Disk size: 8 GiB ID-3: /dev/sdc model: Virtual Disk size: 1024 GiB Partition: ID-1: / size: 1006.85 GiB used: 48.61 GiB (4.8%) fs: ext4 dev: /dev/sdc Swap: ID-1: swap-1 type: partition size: 8 GiB used: 0 KiB (0.0%) dev: /dev/sdb Sensors: Message: No sensor data found. Is lm-sensors configured? Info: Processes: 72 Uptime: 14h 43m Memory: 31.22 GiB used: 1.18 GiB (3.8%) Init: systemd runlevel: 5 Compilers: gcc: 11.4.0 Packages: 913 Shell: Zsh v: 5.8.1 inxi: 3.3.13 ```
Author
Owner

@joesalvati68 commented on GitHub (Jan 14, 2024):

Ok. So apologies if the question seems stupid. How do I get logs on this? Yes, it is on WSL2 but I'm running 32 GB of Ram and an RTX 2070 qand have previously run larger local llms without any issue. I'm still relatively new to this but learning a lot very quickly so appreciate the extra guidance.

<!-- gh-comment-id:1891115835 --> @joesalvati68 commented on GitHub (Jan 14, 2024): Ok. So apologies if the question seems stupid. How do I get logs on this? Yes, it is on WSL2 but I'm running 32 GB of Ram and an RTX 2070 qand have previously run larger local llms without any issue. I'm still relatively new to this but learning a lot very quickly so appreciate the extra guidance.
Author
Owner

@ryukyi commented on GitHub (Jan 14, 2024):

@joesalvati68 As suggested by jmorganca above (from your bash terminal in wsl2):

journalctl -u ollama

hardware specs output:

inxi -Fxz

<!-- gh-comment-id:1891117919 --> @ryukyi commented on GitHub (Jan 14, 2024): @joesalvati68 As suggested by jmorganca above (from your bash terminal in wsl2): `journalctl -u ollama` hardware specs output: `inxi -Fxz`
Author
Owner

@nps798 commented on GitHub (Jan 15, 2024):

having same issue for custom model (i build from GGUF file)
while work without problems with library models

<!-- gh-comment-id:1892207728 --> @nps798 commented on GitHub (Jan 15, 2024): having same issue for custom model (i build from GGUF file) while work without problems with library models
Author
Owner

@akhercha commented on GitHub (Jan 15, 2024):

Hey; having the same problem running the mixtral model:

Jan 15 18:40:58 mori ollama[476938]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: out of memory
Jan 15 18:40:58 mori ollama[476938]: current device: 0
Jan 15 18:40:58 mori ollama[476938]: Lazy loading /tmp/ollama1417450100/cuda/libext_server.so library
Jan 15 18:40:58 mori ollama[476938]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: !"CUDA error"
Jan 15 18:40:58 mori ollama[477424]: ptrace: Operation not permitted.
Jan 15 18:40:58 mori ollama[477424]: No stack.
Jan 15 18:40:58 mori ollama[477424]: The program is not being run.
Jan 15 18:41:02 mori systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT
Jan 15 18:41:02 mori systemd[1]: ollama.service: Failed with result 'core-dump'.
Jan 15 18:41:02 mori systemd[1]: ollama.service: Consumed 4min 52.168s CPU time.
Jan 15 18:41:05 mori systemd[1]: ollama.service: Scheduled restart job, restart counter is at 2.

Same behavior than observed above ; working for small requests but crashing on multi lines.

<!-- gh-comment-id:1892577712 --> @akhercha commented on GitHub (Jan 15, 2024): Hey; having the same problem running the `mixtral` model: ```markdown Jan 15 18:40:58 mori ollama[476938]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: out of memory Jan 15 18:40:58 mori ollama[476938]: current device: 0 Jan 15 18:40:58 mori ollama[476938]: Lazy loading /tmp/ollama1417450100/cuda/libext_server.so library Jan 15 18:40:58 mori ollama[476938]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:6600: !"CUDA error" Jan 15 18:40:58 mori ollama[477424]: ptrace: Operation not permitted. Jan 15 18:40:58 mori ollama[477424]: No stack. Jan 15 18:40:58 mori ollama[477424]: The program is not being run. Jan 15 18:41:02 mori systemd[1]: ollama.service: Main process exited, code=dumped, status=6/ABRT Jan 15 18:41:02 mori systemd[1]: ollama.service: Failed with result 'core-dump'. Jan 15 18:41:02 mori systemd[1]: ollama.service: Consumed 4min 52.168s CPU time. Jan 15 18:41:05 mori systemd[1]: ollama.service: Scheduled restart job, restart counter is at 2. ``` Same behavior than observed above ; working for small requests but crashing on multi lines.
Author
Owner

@aseedb commented on GitHub (Jan 17, 2024):

Same issue.

here is an part of the journal:

Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: mem required  =   70.42 MiB
Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: offloading 32 repeating layers to GPU
Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: offloading non-repeating layers to GPU
Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: offloaded 33/33 layers to GPU
Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: VRAM used: 3847.55 MiB
Jan 17 10:31:44 mifcom2 ollama[3774413]: .......................................................
Jan 17 10:31:44 mifcom2 ollama[3774413]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:9007: out of memory
Jan 17 10:31:44 mifcom2 ollama[3774413]: current device: 3
Jan 17 10:31:44 mifcom2 ollama[3774413]: Lazy loading /tmp/ollama418455061/cuda/libext_server.so library
Jan 17 10:31:44 mifcom2 ollama[3774413]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:9007: !"CUDA error"
Jan 17 10:31:44 mifcom2 ollama[3776988]: Could not attach to process.  If your uid matches the uid of the target
Jan 17 10:31:44 mifcom2 ollama[3776988]: process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
Jan 17 10:31:44 mifcom2 ollama[3776988]: again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
Jan 17 10:31:44 mifcom2 ollama[3776988]: ptrace: Operation not permitted.
Jan 17 10:31:44 mifcom2 ollama[3776988]: No stack.
Jan 17 10:31:44 mifcom2 ollama[3776988]: The program is not being run.
Jan 17 10:31:44 mifcom2 ollama[3774413]: SIGABRT: abort

This makes perfect sense, I have 4 GPUs and some of them are used for other tasks and have their memory close to full.
nvidia-smireturns

|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:01:00.0 Off |                  N/A |
|  0%   28C    P8              12W / 260W |   3318MiB / 11264MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:21:00.0 Off |                  N/A |
|  0%   29C    P8              10W / 260W |     13MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:4D:00.0 Off |                  N/A |
|  0%   29C    P8              17W / 260W |   9983MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:4E:00.0 Off |                  N/A |
|  0%   28C    P8              12W / 260W |   9983MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Under these conditions I can run a 2.7 and 3B models but anything higher crashes.

Is it possible to specify which GPU to use?
Setting CUDA_VISIBLE_DEVICES does not help.

<!-- gh-comment-id:1895467571 --> @aseedb commented on GitHub (Jan 17, 2024): Same issue. here is an part of the journal: ```Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: using CUDA for GPU acceleration Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: mem required = 70.42 MiB Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: offloading 32 repeating layers to GPU Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: offloading non-repeating layers to GPU Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: offloaded 33/33 layers to GPU Jan 17 10:31:43 mifcom2 ollama[3774413]: llm_load_tensors: VRAM used: 3847.55 MiB Jan 17 10:31:44 mifcom2 ollama[3774413]: ....................................................... Jan 17 10:31:44 mifcom2 ollama[3774413]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:9007: out of memory Jan 17 10:31:44 mifcom2 ollama[3774413]: current device: 3 Jan 17 10:31:44 mifcom2 ollama[3774413]: Lazy loading /tmp/ollama418455061/cuda/libext_server.so library Jan 17 10:31:44 mifcom2 ollama[3774413]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:9007: !"CUDA error" Jan 17 10:31:44 mifcom2 ollama[3776988]: Could not attach to process. If your uid matches the uid of the target Jan 17 10:31:44 mifcom2 ollama[3776988]: process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try Jan 17 10:31:44 mifcom2 ollama[3776988]: again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf Jan 17 10:31:44 mifcom2 ollama[3776988]: ptrace: Operation not permitted. Jan 17 10:31:44 mifcom2 ollama[3776988]: No stack. Jan 17 10:31:44 mifcom2 ollama[3776988]: The program is not being run. Jan 17 10:31:44 mifcom2 ollama[3774413]: SIGABRT: abort ``` This makes perfect sense, I have 4 GPUs and some of them are used for other tasks and have their memory close to full. `nvidia-smi`returns ```| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:01:00.0 Off | N/A | | 0% 28C P8 12W / 260W | 3318MiB / 11264MiB | 4% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 2080 Ti Off | 00000000:21:00.0 Off | N/A | | 0% 29C P8 10W / 260W | 13MiB / 11264MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce RTX 2080 Ti Off | 00000000:4D:00.0 Off | N/A | | 0% 29C P8 17W / 260W | 9983MiB / 11264MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce RTX 2080 Ti Off | 00000000:4E:00.0 Off | N/A | | 0% 28C P8 12W / 260W | 9983MiB / 11264MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ ``` Under these conditions I can run a 2.7 and 3B models but anything higher crashes. Is it possible to specify which GPU to use? Setting CUDA_VISIBLE_DEVICES does not help.
Author
Owner

@Deluxer commented on GitHub (Jan 17, 2024):

In my scenario, this is the encountered error

I comprehend that the issue pertains to memory allocation, yet despite my attempts at rebooting the service like sudo systemctl restart ollama, it remains non-functional.

ene 16 10:49:34 deluxer ollama[27135]: llm_load_vocab: special tokens definition check successful ( 259/32000 ).
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: format           = GGUF V3 (latest)
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: arch             = llama
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: vocab type       = SPM
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_vocab          = 32000
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_merges         = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ctx_train      = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_embd           = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head           = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head_kv        = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_layer          = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_rot            = 128
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_gqa            = 1
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_eps       = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ff             = 11008
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert         = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert_used    = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope scaling     = linear
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_base_train  = 10000.0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_scale_train = 1
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_yarn_orig_ctx  = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope_finetuned   = unknown
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model type       = 7B
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model ftype      = Q4_0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model params     = 6.74 B
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: general.name     = LLaMA v2
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: BOS token        = 1 '<s>'
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: EOS token        = 2 '</s>'
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: UNK token        = 0 '<unk>'
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: LF token         = 13 '<0x0A>'
ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: ggml ctx size =    0.11 MiB
ene 16 10:49:34 deluxer ollama[27135]: WARNING: failed to allocate 0.11 MB of pinned memory: unknown error
ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: using CUDA for GPU acceleration
ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: mem required  =   70.42 MiB
ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: offloading 32 repeating layers to GPU
ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: offloading non-repeating layers to GPU
ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: offloaded 33/33 layers to GPU
ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: VRAM used: 3577.55 MiB
ene 16 10:49:34 deluxer ollama[27135]: .
ene 16 10:49:34 deluxer ollama[27135]: CUDA error 999 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:9007: unknown error
ene 16 10:49:34 deluxer ollama[27135]: current device: 0
ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library
ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library
ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library
ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library
ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library
ene 16 10:49:34 deluxer ollama[27135]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:9007: !"CUDA error"
ene 16 10:49:34 deluxer ollama[294553]: Could not attach to process.  If your uid matches the uid of the target
ene 16 10:49:34 deluxer ollama[294553]: process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
ene 16 10:49:34 deluxer ollama[294553]: again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ene 16 10:49:34 deluxer ollama[294553]: ptrace: Inappropriate ioctl for device.
ene 16 10:49:34 deluxer ollama[294553]: No stack.
ene 16 10:49:34 deluxer ollama[294553]: The program is not being run.
ene 16 10:49:34 deluxer ollama[27135]: SIGABRT: abort
ene 16 10:49:34 deluxer ollama[27135]: PC=0x7f97414969fc m=15 sigcode=18446744073709551610
ene 16 10:49:34 deluxer ollama[27135]: signal arrived during cgo execution
ene 16 10:49:34 deluxer ollama[27135]: goroutine 49 [syscall]:
ene 16 10:49:34 deluxer ollama[27135]: runtime.cgocall(0x9c3170, 0xc0001206a0)
ene 16 10:49:34 deluxer ollama[27135]:         /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000120678 sp=0xc000120640 pc=0x4291cb
ene 16 10:49:34 deluxer ollama[27135]: github.com/jmorganca/ollama/llm._Cfunc_dynamic_shim_llama_server_init({0x7f96a0001db0, 0x7f9680dfa410, 0x7f9680decab0, 0x7f9680df0400, 0x7f9680e02980, 0x7f9680df7a30, 0x7f9680df02a0, 0x7f9680decb30, 0x7f9680dfdc10, 0x7f9680dfd7c0, ...}, ...)
ene 16 10:49:34 deluxer ollama[27135]:         _cgo_gotypes.go:287 +0x45 fp=0xc0001206a0 sp=0xc000120678 pc=0x7cf965
ene 16 10:49:34 deluxer ollama[27135]: github.com/jmorganca/ollama/llm.(*shimExtServer).llama_server_init.func1(0x45973b?, 0x80?, 0x80?)
ene 16 10:49:34 deluxer ollama[27135]:         /go/src/github.com/jmorganca/ollama/llm/shim_ext_server.go:40 +0xec fp=0xc000120790 sp=0xc0001206a0 pc=0x7d4d2c
ene 16 10:49:34 deluxer ollama[27135]: github.com/jmorganca/ollama/llm.(*shimExtServer).llama_server_init(0xc0000a22d0?, 0x0?, 0x43a2e8?)
lines 2644-2715
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - kv  22:               general.quantization_version u32              = 2
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type  f32:   65 tensors
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type q4_0:  225 tensors
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type q6_K:    1 tensors
ene 16 10:49:34 deluxer ollama[27135]: llm_load_vocab: special tokens definition check successful ( 259/32000 ).
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: format           = GGUF V3 (latest)
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: arch             = llama
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: vocab type       = SPM
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_vocab          = 32000
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_merges         = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ctx_train      = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_embd           = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head           = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head_kv        = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_layer          = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_rot            = 128
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_gqa            = 1
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_eps       = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ff             = 11008
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert         = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert_used    = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope scaling     = linear
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_base_train  = 10000.0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_scale_train = 1
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_yarn_orig_ctx  = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope_finetuned   = unknown
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model type       = 7B
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model ftype      = Q4_0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model params     = 6.74 B
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: general.name     = LLaMA v2
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: BOS token        = 1 '<s>'
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: EOS token        = 2 '</s>'
lines 2644-2679
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - >
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - >
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - >
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - >
ene 16 10:49:34 deluxer ollama[27135]: llm_load_vocab: specia>
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f>
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: a>
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: v>
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n>
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n>
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n>
lines 2644-2654
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - kv  22:               general.quantization_version u32              = 2
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type  f32:   65 tensors
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type q4_0:  225 tensors
ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type q6_K:    1 tensors
ene 16 10:49:34 deluxer ollama[27135]: llm_load_vocab: special tokens definition check successful ( 259/32000 ).
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: format           = GGUF V3 (latest)
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: arch             = llama
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: vocab type       = SPM
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_vocab          = 32000
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_merges         = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ctx_train      = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_embd           = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head           = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head_kv        = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_layer          = 32
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_rot            = 128
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_gqa            = 1
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_eps       = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ff             = 11008
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert         = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert_used    = 0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope scaling     = linear
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_base_train  = 10000.0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_scale_train = 1
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_yarn_orig_ctx  = 4096
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope_finetuned   = unknown
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model type       = 7B
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model ftype      = Q4_0
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model params     = 6.74 B
ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)

GPU's status and specifications.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off | 00000000:09:00.0  On |                  N/A |
|  0%   28C    P8              14W / 165W |    668MiB / 16380MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2516      G   /usr/lib/xorg/Xorg                          332MiB |
|    0   N/A  N/A      2653      G   /usr/bin/gnome-shell                         84MiB |
|    0   N/A  N/A     29762      G   ...,262144 --variations-seed-version=1      167MiB |
|    0   N/A  N/A     53640      G   ...sion,SpareRendererForSitePerProcess       68MiB |
+---------------------------------------------------------------------------------------+

<!-- gh-comment-id:1896854393 --> @Deluxer commented on GitHub (Jan 17, 2024): In my scenario, this is the encountered error I comprehend that the issue pertains to **_memory allocation_**, yet despite my attempts at rebooting the service like _sudo systemctl restart ollama_, it remains non-functional. ```shell ene 16 10:49:34 deluxer ollama[27135]: llm_load_vocab: special tokens definition check successful ( 259/32000 ). ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: format = GGUF V3 (latest) ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: arch = llama ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: vocab type = SPM ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_vocab = 32000 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_merges = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ctx_train = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_embd = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head_kv = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_layer = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_rot = 128 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_gqa = 1 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_eps = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ff = 11008 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert_used = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope scaling = linear ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_base_train = 10000.0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_scale_train = 1 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_yarn_orig_ctx = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope_finetuned = unknown ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model type = 7B ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model ftype = Q4_0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model params = 6.74 B ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: general.name = LLaMA v2 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: BOS token = 1 '<s>' ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: EOS token = 2 '</s>' ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: UNK token = 0 '<unk>' ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: LF token = 13 '<0x0A>' ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: ggml ctx size = 0.11 MiB ene 16 10:49:34 deluxer ollama[27135]: WARNING: failed to allocate 0.11 MB of pinned memory: unknown error ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: using CUDA for GPU acceleration ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: mem required = 70.42 MiB ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: offloading 32 repeating layers to GPU ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: offloading non-repeating layers to GPU ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: offloaded 33/33 layers to GPU ene 16 10:49:34 deluxer ollama[27135]: llm_load_tensors: VRAM used: 3577.55 MiB ene 16 10:49:34 deluxer ollama[27135]: . ene 16 10:49:34 deluxer ollama[27135]: CUDA error 999 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:9007: unknown error ene 16 10:49:34 deluxer ollama[27135]: current device: 0 ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library ene 16 10:49:34 deluxer ollama[27135]: Lazy loading /tmp/ollama3866583403/cuda/libext_server.so library ene 16 10:49:34 deluxer ollama[27135]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:9007: !"CUDA error" ene 16 10:49:34 deluxer ollama[294553]: Could not attach to process. If your uid matches the uid of the target ene 16 10:49:34 deluxer ollama[294553]: process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try ene 16 10:49:34 deluxer ollama[294553]: again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ene 16 10:49:34 deluxer ollama[294553]: ptrace: Inappropriate ioctl for device. ene 16 10:49:34 deluxer ollama[294553]: No stack. ene 16 10:49:34 deluxer ollama[294553]: The program is not being run. ene 16 10:49:34 deluxer ollama[27135]: SIGABRT: abort ene 16 10:49:34 deluxer ollama[27135]: PC=0x7f97414969fc m=15 sigcode=18446744073709551610 ene 16 10:49:34 deluxer ollama[27135]: signal arrived during cgo execution ene 16 10:49:34 deluxer ollama[27135]: goroutine 49 [syscall]: ene 16 10:49:34 deluxer ollama[27135]: runtime.cgocall(0x9c3170, 0xc0001206a0) ene 16 10:49:34 deluxer ollama[27135]: /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000120678 sp=0xc000120640 pc=0x4291cb ene 16 10:49:34 deluxer ollama[27135]: github.com/jmorganca/ollama/llm._Cfunc_dynamic_shim_llama_server_init({0x7f96a0001db0, 0x7f9680dfa410, 0x7f9680decab0, 0x7f9680df0400, 0x7f9680e02980, 0x7f9680df7a30, 0x7f9680df02a0, 0x7f9680decb30, 0x7f9680dfdc10, 0x7f9680dfd7c0, ...}, ...) ene 16 10:49:34 deluxer ollama[27135]: _cgo_gotypes.go:287 +0x45 fp=0xc0001206a0 sp=0xc000120678 pc=0x7cf965 ene 16 10:49:34 deluxer ollama[27135]: github.com/jmorganca/ollama/llm.(*shimExtServer).llama_server_init.func1(0x45973b?, 0x80?, 0x80?) ene 16 10:49:34 deluxer ollama[27135]: /go/src/github.com/jmorganca/ollama/llm/shim_ext_server.go:40 +0xec fp=0xc000120790 sp=0xc0001206a0 pc=0x7d4d2c ene 16 10:49:34 deluxer ollama[27135]: github.com/jmorganca/ollama/llm.(*shimExtServer).llama_server_init(0xc0000a22d0?, 0x0?, 0x43a2e8?) lines 2644-2715 ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - kv 22: general.quantization_version u32 = 2 ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type f32: 65 tensors ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type q4_0: 225 tensors ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type q6_K: 1 tensors ene 16 10:49:34 deluxer ollama[27135]: llm_load_vocab: special tokens definition check successful ( 259/32000 ). ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: format = GGUF V3 (latest) ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: arch = llama ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: vocab type = SPM ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_vocab = 32000 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_merges = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ctx_train = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_embd = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head_kv = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_layer = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_rot = 128 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_gqa = 1 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_eps = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ff = 11008 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert_used = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope scaling = linear ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_base_train = 10000.0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_scale_train = 1 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_yarn_orig_ctx = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope_finetuned = unknown ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model type = 7B ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model ftype = Q4_0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model params = 6.74 B ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: general.name = LLaMA v2 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: BOS token = 1 '<s>' ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: EOS token = 2 '</s>' lines 2644-2679 ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - > ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - > ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - > ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - > ene 16 10:49:34 deluxer ollama[27135]: llm_load_vocab: specia> ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f> ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: a> ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: v> ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n> ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n> ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n> lines 2644-2654 ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - kv 22: general.quantization_version u32 = 2 ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type f32: 65 tensors ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type q4_0: 225 tensors ene 16 10:49:34 deluxer ollama[27135]: llama_model_loader: - type q6_K: 1 tensors ene 16 10:49:34 deluxer ollama[27135]: llm_load_vocab: special tokens definition check successful ( 259/32000 ). ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: format = GGUF V3 (latest) ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: arch = llama ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: vocab type = SPM ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_vocab = 32000 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_merges = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ctx_train = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_embd = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_head_kv = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_layer = 32 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_rot = 128 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_gqa = 1 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_eps = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_ff = 11008 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_expert_used = 0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope scaling = linear ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_base_train = 10000.0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: freq_scale_train = 1 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: n_yarn_orig_ctx = 4096 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: rope_finetuned = unknown ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model type = 7B ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model ftype = Q4_0 ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model params = 6.74 B ene 16 10:49:34 deluxer ollama[27135]: llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) ``` GPU's status and specifications. ```shell +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:09:00.0 On | N/A | | 0% 28C P8 14W / 165W | 668MiB / 16380MiB | 1% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 2516 G /usr/lib/xorg/Xorg 332MiB | | 0 N/A N/A 2653 G /usr/bin/gnome-shell 84MiB | | 0 N/A N/A 29762 G ...,262144 --variations-seed-version=1 167MiB | | 0 N/A N/A 53640 G ...sion,SpareRendererForSitePerProcess 68MiB | +---------------------------------------------------------------------------------------+ ```
Author
Owner

@simonnxren commented on GitHub (Jan 21, 2024):

Screenshot 2024-01-21 173824
Same error here. But inside the printout of journalctl, it shows "no CUDA-capable device"

<!-- gh-comment-id:1902570775 --> @simonnxren commented on GitHub (Jan 21, 2024): ![Screenshot 2024-01-21 173824](https://github.com/jmorganca/ollama/assets/20294218/38657c85-f5f2-4b25-9869-f3df26347336) Same error here. But inside the printout of journalctl, it shows "no CUDA-capable device"
Author
Owner

@Deluxer commented on GitHub (Jan 21, 2024):

I was able to solve the problem by using the CUDA drivers corresponding to my video card.

Please try to install the corresponding version of CUDA Toolkit.

If you use Linux follow the instructions from Ollana on Linux

For newer versions of NVIDIA use

sudo apt-get install -y cuda-drivers-545

instead of

sudo apt-get install -y cuda-drivers
<!-- gh-comment-id:1902710497 --> @Deluxer commented on GitHub (Jan 21, 2024): I was able to solve the problem by using the CUDA drivers corresponding to my video card. Please try to install the corresponding version of [CUDA Toolkit.](https://developer.nvidia.com/cuda-downloads) If you use Linux follow the instructions from [Ollana on Linux ](https://github.com/jmorganca/ollama/blob/main/docs/linux.md) For newer versions of NVIDIA use ```shell sudo apt-get install -y cuda-drivers-545 ``` instead of ```shell sudo apt-get install -y cuda-drivers ```
Author
Owner

@CaiZekun commented on GitHub (Jan 26, 2024):

Me too, I have encountered this situation since I downloaded llama2 on wsl
Below is my log, how can I solve this problem?

Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rbx 0x7fb0297fc640
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rcx 0x7fb09c4309fc
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rdx 0x6
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rdi 0x45f
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rsi 0x47b
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rbp 0x47b
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rsp 0x7fb0297fb3e0
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r8 0x7fb0297fb4b0
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r9 0x7fb0297fb450
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r10 0x8
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r11 0x246
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r12 0x6
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r13 0x16
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r14 0x245640490
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r15 0x8
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rip 0x7fb09c4309fc
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rflags 0x246
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: cs 0x33
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: fs 0x0
Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: gs 0x0
Jan 26 17:52:14 DESKTOP-0JQI779 systemd[1]: ollama.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 26 17:52:14 DESKTOP-0JQI779 systemd[1]: ollama.service: Failed with result 'exit-code'.
Jan 26 17:52:17 DESKTOP-0JQI779 systemd[1]: ollama.service: Scheduled restart job, restart counter is at 8.
Jan 26 17:52:17 DESKTOP-0JQI779 systemd[1]: Stopped Ollama Service.
Jan 26 17:52:17 DESKTOP-0JQI779 systemd[1]: Started Ollama Service.
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 images.go:808: total blobs: 6
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 images.go:815: total unused blobs removed: 0
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20)
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:88: Detecting GPU type
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:203: Searching for GPU management library libnvidia-ml.so
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:248: Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.>
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:94: Nvidia GPU detected
Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:135: CUDA Compute Capability detected: 7.5

<!-- gh-comment-id:1911809891 --> @CaiZekun commented on GitHub (Jan 26, 2024): Me too, I have encountered this situation since I downloaded llama2 on wsl Below is my log, how can I solve this problem? Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rbx 0x7fb0297fc640 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rcx 0x7fb09c4309fc Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rdx 0x6 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rdi 0x45f Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rsi 0x47b Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rbp 0x47b Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rsp 0x7fb0297fb3e0 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r8 0x7fb0297fb4b0 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r9 0x7fb0297fb450 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r10 0x8 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r11 0x246 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r12 0x6 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r13 0x16 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r14 0x245640490 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: r15 0x8 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rip 0x7fb09c4309fc Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: rflags 0x246 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: cs 0x33 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: fs 0x0 Jan 26 17:52:14 DESKTOP-0JQI779 ollama[1119]: gs 0x0 Jan 26 17:52:14 DESKTOP-0JQI779 systemd[1]: ollama.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Jan 26 17:52:14 DESKTOP-0JQI779 systemd[1]: ollama.service: Failed with result 'exit-code'. Jan 26 17:52:17 DESKTOP-0JQI779 systemd[1]: ollama.service: Scheduled restart job, restart counter is at 8. Jan 26 17:52:17 DESKTOP-0JQI779 systemd[1]: Stopped Ollama Service. Jan 26 17:52:17 DESKTOP-0JQI779 systemd[1]: Started Ollama Service. Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 images.go:808: total blobs: 6 Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 images.go:815: total unused blobs removed: 0 Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20) Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:88: Detecting GPU type Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:203: Searching for GPU management library libnvidia-ml.so Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:248: Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.> Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:94: Nvidia GPU detected Jan 26 17:52:17 DESKTOP-0JQI779 ollama[1154]: 2024/01/26 17:52:17 gpu.go:135: CUDA Compute Capability detected: 7.5
Author
Owner

@dhiltgen commented on GitHub (Jan 26, 2024):

@musiaht your issue is tracked in issue #2165 - please give 0.1.22 a try and see if that works for your setup as we have fixed various ROCm related defects recently.

@ryukyi @akhercha @aseedb you hit an out-of-memory error on your CUDA card. We've been making steady improvements on our memory estimates, so I'd encourage you all to give 0.1.22 a try and let us know if you still see the crashes.

@CaiZekun unfortunately that portion of the log doesn't contain what we need to understand why it crashed. I'd suggest upgrading to 0.1.22 and if you still see a crash, please share more of the log.

<!-- gh-comment-id:1912734404 --> @dhiltgen commented on GitHub (Jan 26, 2024): @musiaht your issue is tracked in issue #2165 - please give 0.1.22 a try and see if that works for your setup as we have fixed various ROCm related defects recently. @ryukyi @akhercha @aseedb you hit an out-of-memory error on your CUDA card. We've been making steady improvements on our memory estimates, so I'd encourage you all to give 0.1.22 a try and let us know if you still see the crashes. @CaiZekun unfortunately that portion of the log doesn't contain what we need to understand why it crashed. I'd suggest upgrading to 0.1.22 and if you still see a crash, please share more of the log.
Author
Owner

@CaiZekun commented on GitHub (Jan 27, 2024):

Thankyou for your suggestion!

I updated my ollama to 0.1.22, now I can use ollama run normally.

But when I use ollama serve, the following situation occurs. How should I solve this problem?

image

Below is my log:

Jan 28 00:18:09 DESKTOP-0JQI779 systemd[1]: Stopping Ollama Service...
Jan 28 00:18:09 DESKTOP-0JQI779 systemd[1]: ollama.service: Deactivated successfully.
Jan 28 00:18:09 DESKTOP-0JQI779 systemd[1]: Stopped Ollama Service.
Jan 28 00:27:17 DESKTOP-0JQI779 systemd[1]: Started Ollama Service.
Jan 28 00:27:17 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:17 images.go:857: INFO total blobs: 6
Jan 28 00:27:17 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:17 images.go:864: INFO total unused blobs remov>
Jan 28 00:27:17 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:17 routes.go:950: INFO Listening on 127.0.0.1:1>
Jan 28 00:27:17 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:17 payload_common.go:106: INFO Extracting dynam>
Jan 28 00:27:20 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:20 payload_common.go:145: INFO Dynamic LLM libr>
Jan 28 00:27:20 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:20 gpu.go:94: INFO Detecting GPU type
Jan 28 00:27:20 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:20 gpu.go:236: INFO Searching for GPU managemen>
Jan 28 00:27:21 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:21 gpu.go:282: INFO Discovered GPU libraries: [>
Jan 28 00:27:21 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:21 gpu.go:99: INFO Nvidia GPU detected
Jan 28 00:27:21 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:21 gpu.go:140: INFO CUDA Compute Capability det>
Jan 28 00:27:32 DESKTOP-0JQI779 systemd[1]: Stopping Ollama Service...
Jan 28 00:27:32 DESKTOP-0JQI779 systemd[1]: ollama.service: Deactivated successfully.
Jan 28 00:27:32 DESKTOP-0JQI779 systemd[1]: Stopped Ollama Service.

GPU's status and specifications.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.67       Driver Version: 517.00       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   37C    P8     3W /  N/A |      9MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
<!-- gh-comment-id:1913252003 --> @CaiZekun commented on GitHub (Jan 27, 2024): Thankyou for your suggestion! I updated my ollama to 0.1.22, now I can use `ollama run` normally. But when I use `ollama serve`, the following situation occurs. How should I solve this problem? ![image](https://github.com/ollama/ollama/assets/135045336/712444c7-a6cb-43e2-99b2-cdb667824769) Below is my log: ``` Jan 28 00:18:09 DESKTOP-0JQI779 systemd[1]: Stopping Ollama Service... Jan 28 00:18:09 DESKTOP-0JQI779 systemd[1]: ollama.service: Deactivated successfully. Jan 28 00:18:09 DESKTOP-0JQI779 systemd[1]: Stopped Ollama Service. Jan 28 00:27:17 DESKTOP-0JQI779 systemd[1]: Started Ollama Service. Jan 28 00:27:17 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:17 images.go:857: INFO total blobs: 6 Jan 28 00:27:17 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:17 images.go:864: INFO total unused blobs remov> Jan 28 00:27:17 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:17 routes.go:950: INFO Listening on 127.0.0.1:1> Jan 28 00:27:17 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:17 payload_common.go:106: INFO Extracting dynam> Jan 28 00:27:20 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:20 payload_common.go:145: INFO Dynamic LLM libr> Jan 28 00:27:20 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:20 gpu.go:94: INFO Detecting GPU type Jan 28 00:27:20 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:20 gpu.go:236: INFO Searching for GPU managemen> Jan 28 00:27:21 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:21 gpu.go:282: INFO Discovered GPU libraries: [> Jan 28 00:27:21 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:21 gpu.go:99: INFO Nvidia GPU detected Jan 28 00:27:21 DESKTOP-0JQI779 ollama[614]: 2024/01/28 00:27:21 gpu.go:140: INFO CUDA Compute Capability det> Jan 28 00:27:32 DESKTOP-0JQI779 systemd[1]: Stopping Ollama Service... Jan 28 00:27:32 DESKTOP-0JQI779 systemd[1]: ollama.service: Deactivated successfully. Jan 28 00:27:32 DESKTOP-0JQI779 systemd[1]: Stopped Ollama Service. ``` GPU's status and specifications. ``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.67 Driver Version: 517.00 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A | | N/A 37C P8 3W / N/A | 9MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ```
Author
Owner

@dhiltgen commented on GitHub (Jan 27, 2024):

@CaiZekun from those logs, I'm not seeing any crashes, it looks more like a normal shutdown. You're running in WSL2 from the looks of it, and it seems like all our discovery logic is working correctly, and we find your NVIDIA GPU. What might be helpful to try is in one wsl terminal window, run sudo systemctl stop ollama; OLLAMA_DEBUG=1 ollama serve and then in another wsl terminal window, after that "serve" command gets started, run ollama run orca-mini then /set verbose and give it some prompt. If it doesn't work, share the server log so we can see what failed.

<!-- gh-comment-id:1913255632 --> @dhiltgen commented on GitHub (Jan 27, 2024): @CaiZekun from those logs, I'm not seeing any crashes, it looks more like a normal shutdown. You're running in WSL2 from the looks of it, and it seems like all our discovery logic is working correctly, and we find your NVIDIA GPU. What might be helpful to try is in one wsl terminal window, run `sudo systemctl stop ollama; OLLAMA_DEBUG=1 ollama serve` and then in another wsl terminal window, after that "serve" command gets started, run `ollama run orca-mini` then `/set verbose` and give it some prompt. If it doesn't work, share the server log so we can see what failed.
Author
Owner

@CaiZekun commented on GitHub (Jan 28, 2024):

Thanks for your attention! I followed your instructions.
Below is the first wsl window:

(LLM_env) czk@DESKTOP-0JQI779:~$ ollama list
NAME            ID              SIZE    MODIFIED
llama2:latest   78e26419b446    3.8 GB  13 hours ago
(LLM_env) czk@DESKTOP-0JQI779:~$ sudo systemctl stop ollama
[sudo] password for czk:
(LLM_env) czk@DESKTOP-0JQI779:~$ OLLAMA_DEBUG=1 ollama serve
time=2024-01-28T10:49:30.912+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/server/routes.go:926 msg="Debug logging enabled"
time=2024-01-28T10:49:30.913+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/images.go:857 msg="total blobs: 0"
time=2024-01-28T10:49:30.913+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/images.go:864 msg="total unused blobs removed: 0"
time=2024-01-28T10:49:30.913+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/routes.go:950 msg="Listening on 127.0.0.1:11434 (version 0.1.22)"
time=2024-01-28T10:49:30.914+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-01-28T10:49:33.206+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 cpu cpu_avx2 cpu_avx cuda_v11 rocm_v5]"
time=2024-01-28T10:49:33.206+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/llm/payload_common.go:146 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-01-28T10:49:33.206+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:94 msg="Detecting GPU type"
time=2024-01-28T10:49:33.206+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:236 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-01-28T10:49:33.206+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:254 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/local/cuda-11.7/lib64/libnvidia-ml.so*]"
time=2024-01-28T10:49:34.745+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:282 msg="Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.so.1 /usr/lib/wsl/drivers/nvlt.inf_amd64_7947c31fc944635c/libnvidia-ml.so.1]"
wiring nvidia management library functions in /usr/lib/wsl/lib/libnvidia-ml.so.1
dlsym: nvmlInit_v2
dlsym: nvmlShutdown
dlsym: nvmlDeviceGetHandleByIndex
dlsym: nvmlDeviceGetMemoryInfo
dlsym: nvmlDeviceGetCount_v2
dlsym: nvmlDeviceGetCudaComputeCapability
dlsym: nvmlSystemGetDriverVersion
dlsym: nvmlDeviceGetName
dlsym: nvmlDeviceGetSerial
dlsym: nvmlDeviceGetVbiosVersion
dlsym: nvmlDeviceGetBoardPartNumber
dlsym: nvmlDeviceGetBrand
CUDA driver version: 517.00
time=2024-01-28T10:49:34.777+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:99 msg="Nvidia GPU detected"
[0] CUDA device name: NVIDIA GeForce GTX 1650 Ti
[0] CUDA part number:
nvmlDeviceGetSerial failed: 3
[0] CUDA vbios version: 90.17.42.00.49
[0] CUDA brand: 5
[0] CUDA totalMem 4294967296
[0] CUDA usedMem 4117594112
time=2024-01-28T10:49:34.788+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:140 msg="CUDA Compute Capability detected: 7.5"
time=2024-01-28T10:49:34.788+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:225 msg="cuda detected 1 devices with 2902M available memory"
[GIN] 2024/01/28 - 10:51:19 | 200 |        24.5��s |       127.0.0.1 | HEAD     "/"
[GIN] 2024/01/28 - 10:51:19 | 404 |       172.9µs |       127.0.0.1 | POST     "/api/show"
time=2024-01-28T10:51:37.632+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 8934d96d3f08 in 39 100 MB part(s)"
time=2024-01-28T10:52:31.365+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:162 msg="8934d96d3f08 part 5 attempt 0 failed: unexpected EOF, retrying in 1s"
time=2024-01-28T10:53:55.721+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 8c17c2ebb0ea in 1 7.0 KB part(s)"
time=2024-01-28T10:54:15.629+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 7c23fb36d801 in 1 4.8 KB part(s)"
time=2024-01-28T10:54:35.674+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 2e0493f67d0c in 1 59 B part(s)"
time=2024-01-28T10:54:55.608+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading fa304d675061 in 1 91 B part(s)"
time=2024-01-28T10:55:15.976+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 42ba7f8a01dd in 1 557 B part(s)"
[GIN] 2024/01/28 - 10:55:35 | 200 |         4m16s |       127.0.0.1 | POST     "/api/pull"
[GIN] 2024/01/28 - 10:55:35 | 200 |       377.9µs |       127.0.0.1 | POST     "/api/show"
[0] CUDA device name: NVIDIA GeForce GTX 1650 Ti
[0] CUDA part number:
nvmlDeviceGetSerial failed: 3
[0] CUDA vbios version: 90.17.42.00.49
[0] CUDA brand: 5
[0] CUDA totalMem 4294967296
[0] CUDA usedMem 4117594112
time=2024-01-28T10:55:35.431+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:140 msg="CUDA Compute Capability detected: 7.5"
time=2024-01-28T10:55:35.431+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:225 msg="cuda detected 1 devices with 2902M available memory"
[0] CUDA device name: NVIDIA GeForce GTX 1650 Ti
[0] CUDA part number:
nvmlDeviceGetSerial failed: 3
[0] CUDA vbios version: 90.17.42.00.49
[0] CUDA brand: 5
[0] CUDA totalMem 4294967296
[0] CUDA usedMem 4117594112
time=2024-01-28T10:55:35.431+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:140 msg="CUDA Compute Capability detected: 7.5"
time=2024-01-28T10:55:35.431+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama1176188984/cuda_v11/libext_server.so
time=2024-01-28T10:55:35.438+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1176188984/cuda_v11/libext_server.so"
time=2024-01-28T10:55:35.438+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:145 msg="Initializing llama server"
[1706410535] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
[1706410535] Performing pre-initialization of GPU
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 Ti, compute capability 7.5, VMM: yes
llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from /home/czk/.ollama/models/blobs/sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.22 MiB
llm_load_tensors: offloading 19 repeating layers to GPU
llm_load_tensors: offloaded 19/33 layers to GPU
llm_load_tensors:        CPU buffer size =  3647.87 MiB
llm_load_tensors:      CUDA0 buffer size =  2063.29 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =   416.00 MiB
llama_kv_cache_init:      CUDA0 KV buffer size =   608.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:  CUDA_Host input buffer size   =    12.01 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   156.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =   152.00 MiB
llama_new_context_with_model: graph splits (measure): 5
[1706410537] warming up the model with an empty run
[1706410537] Available slots:
[1706410537]  -> Slot 0 - max context: 2048
time=2024-01-28T10:55:37.689+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:156 msg="Starting llama main loop"
[1706410537] llama server main loop starting
[1706410537] all slots are idle and system prompt is empty, clear the KV cache
[GIN] 2024/01/28 - 10:55:37 | 200 |  2.386657505s |       127.0.0.1 | POST     "/api/chat"
time=2024-01-28T10:55:45.691+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:170 msg="loaded 0 images"
[1706410545] slot 0 is processing [task id: 0]
[1706410545] slot 0 : in cache: 0 tokens | to process: 100 tokens
[1706410545] slot 0 : kv cache rm - [0, end)
[1706410550] sampled token:    13: '
'
[1706410550] sampled token:  1576: 'The'
[1706410550] sampled token:  2643: ' message'
[1706410550] sampled token:   366: ' you'
[1706410550] sampled token:  4944: ' provided'
[1706410550] sampled token: 14088: ' indicates'
[1706410551] sampled token:   393: ' that'
[1706410551] sampled token:   278: ' the'
[1706410551] sampled token:   421: ' `'
[1706410551] sampled token: 29907: 'C'
[1706410551] sampled token: 29965: 'U'
[1706410551] sampled token:  7698: 'DA'
[1706410551] sampled token: 29952: '`'
[1706410551] sampled token: 15326: ' detection'
[1706410551] sampled token:   756: ' has'
[1706410551] sampled token:  1476: ' found'
[1706410552] sampled token: 29871: ' '
[1706410552] sampled token: 29896: '1'
[1706410552] sampled token:  4742: ' device'
[1706410552] sampled token:   411: ' with'
[1706410552] sampled token: 29871: ' '
[1706410552] sampled token: 29906: '2'
[1706410552] sampled token: 29929: '9'
[1706410552] sampled token: 29900: '0'
[1706410553] sampled token: 29906: '2'
[1706410553] sampled token:  4508: ' meg'
[1706410553] sampled token: 10798: 'aby'
[1706410553] sampled token:  2167: 'tes'
[1706410553] sampled token:   310: ' of'
[1706410553] sampled token:  3625: ' available'
[1706410553] sampled token:  3370: ' memory'
[1706410553] sampled token: 29889: '.'
[1706410554] sampled token:   910: ' This'
[1706410554] sampled token:  2472: ' information'
[1706410554] sampled token:   338: ' is'
[1706410554] sampled token:  1641: ' being'
[1706410554] sampled token: 13817: ' logged'
[1706410554] sampled token:   472: ' at'
[1706410554] sampled token:   278: ' the'
[1706410555] sampled token: 21681: ' DEBUG'
[1706410555] sampled token:  3233: ' level'
[1706410555] sampled token: 29892: ','
[1706410555] sampled token:   607: ' which'
[1706410555] sampled token:  2794: ' means'
[1706410555] sampled token:   372: ' it'
[1706410555] sampled token: 29915: '''
[1706410555] sampled token: 29879: 's'
[1706410556] sampled token:   385: ' an'
[1706410556] sampled token:  4100: ' important'
[1706410556] sampled token:  9493: ' detail'
[1706410556] sampled token:   393: ' that'
[1706410556] sampled token:   278: ' the'
[1706410556] sampled token:  1824: ' program'
[1706410556] sampled token: 10753: ' wants'
[1706410556] sampled token:   304: ' to'
[1706410557] sampled token: 23120: ' communicate'
[1706410557] sampled token:   304: ' to'
[1706410557] sampled token:   278: ' the'
[1706410557] sampled token:  1404: ' user'
[1706410557] sampled token:   470: ' or'
[1706410557] sampled token: 13897: ' developer'
[1706410557] sampled token: 29889: '.'
[1706410558] sampled token:    13: '
'
[1706410558] sampled token:    13: '
'
[1706410558] sampled token: 10605: 'Here'
[1706410558] sampled token: 29915: '''
[1706410558] sampled token: 29879: 's'
[1706410558] sampled token:   263: ' a'
[1706410558] sampled token:  2867: ' break'
[1706410558] sampled token:  3204: 'down'
[1706410559] sampled token:   310: ' of'
[1706410559] sampled token:   278: ' the'
[1706410559] sampled token:  2643: ' message'
[1706410559] sampled token: 29901: ':'
[1706410559] sampled token:    13: '
'
[1706410559] sampled token:    13: '
'
[1706410559] sampled token: 29930: '*'
[1706410560] sampled token:   421: ' `'
[1706410560] sampled token:  2230: 'time'
[1706410560] sampled token:  6998: '`:'
[1706410560] sampled token:   450: ' The'
[1706410560] sampled token: 14334: ' timestamp'
[1706410560] sampled token:   310: ' of'
[1706410560] sampled token:   746: ' when'
[1706410560] sampled token:   278: ' the'
[1706410561] sampled token:  2643: ' message'
[1706410561] sampled token:   471: ' was'
[1706410561] sampled token:  5759: ' generated'
[1706410561] sampled token: 29892: ','
[1706410561] sampled token:   297: ' in'
[1706410561] sampled token:   278: ' the'
[1706410561] sampled token:  3402: ' format'
[1706410561] sampled token:   421: ' `'
[1706410562] sampled token: 14995: 'YY'
[1706410562] sampled token: 14995: 'YY'
[1706410562] sampled token: 29899: '-'
[1706410562] sampled token:  7428: 'MM'
[1706410562] sampled token: 29899: '-'
[1706410562] sampled token:  7858: 'DD'
[1706410562] sampled token:  4690: 'TH'
[1706410563] sampled token: 29950: 'H'
[1706410563] sampled token: 29901: ':'
[1706410563] sampled token:  7428: 'MM'
[1706410563] sampled token: 29901: ':'
[1706410563] sampled token:  1799: 'SS'
[1706410563] sampled token: 29889: '.'
[1706410563] sampled token: 22791: 'XXX'
[1706410563] sampled token: 29974: '+'
[1706410564] sampled token: 29900: '0'
[1706410564] sampled token: 29900: '0'
[1706410564] sampled token: 29900: '0'
[1706410564] sampled token: 29900: '0'
[1706410564] sampled token:  1412: '`.'
[1706410564] sampled token:   512: ' In'
[1706410564] sampled token:   445: ' this'
[1706410565] sampled token:  1206: ' case'
[1706410565] sampled token: 29892: ','
[1706410565] sampled token:   372: ' it'
[1706410565] sampled token: 29915: '''
[1706410565] sampled token: 29879: 's'
[1706410565] sampled token:  5490: ' January'
[1706410565] sampled token: 29871: ' '
[1706410565] sampled token: 29906: '2'
[1706410566] sampled token: 29947: '8'
[1706410566] sampled token: 29892: ','
[1706410566] sampled token: 29871: ' '
[1706410566] sampled token: 29906: '2'
[1706410566] sampled token: 29900: '0'
[1706410566] sampled token: 29906: '2'
[1706410566] sampled token: 29946: '4'
[1706410567] sampled token: 29892: ','
[1706410567] sampled token:   472: ' at'
[1706410567] sampled token: 29871: ' '
[1706410567] sampled token: 29896: '1'
[1706410567] sampled token: 29900: '0'
[1706410567] sampled token: 29901: ':'
[1706410567] sampled token: 29946: '4'
[1706410567] sampled token: 29929: '9'
[1706410568] sampled token: 29901: ':'
[1706410568] sampled token: 29941: '3'
[1706410568] sampled token: 29946: '4'
[1706410568] sampled token: 13862: ' AM'
[1706410568] sampled token: 20532: ' (+'
[1706410568] sampled token: 29900: '0'
[1706410568] sampled token: 29947: '8'
[1706410569] sampled token: 29901: ':'
[1706410569] sampled token: 29900: '0'
[1706410569] sampled token: 29900: '0'
[1706410569] sampled token:   467: ').'
[1706410569] sampled token:    13: '
'
[1706410569] sampled token: 29930: '*'
[1706410569] sampled token:   421: ' `'
[1706410570] sampled token:  5563: 'level'
[1706410570] sampled token:  6998: '`:'
[1706410570] sampled token:   450: ' The'
[1706410570] sampled token:  1480: ' log'
[1706410570] sampled token:  3233: ' level'
[1706410570] sampled token:   310: ' of'
[1706410570] sampled token:   278: ' the'
[1706410570] sampled token:  2643: ' message'
[1706410571] sampled token: 29892: ','
[1706410571] sampled token:   607: ' which'
[1706410571] sampled token: 14088: ' indicates'
[1706410571] sampled token:   920: ' how'
[1706410571] sampled token:  4100: ' important'
[1706410571] sampled token:   372: ' it'
[1706410571] sampled token:   338: ' is'
[1706410572] sampled token: 29889: '.'
[1706410572] sampled token:   512: ' In'
[1706410572] sampled token:   445: ' this'
[1706410572] sampled token:  1206: ' case'
[1706410572] sampled token: 29892: ','
[1706410572] sampled token:   372: ' it'
[1706410572] sampled token: 29915: '''
[1706410572] sampled token: 29879: 's'
[1706410573] sampled token:   731: ' set'
[1706410573] sampled token:   304: ' to'
[1706410573] sampled token: 21681: ' DEBUG'
[1706410573] sampled token: 29892: ','
[1706410573] sampled token:   607: ' which'
[1706410573] sampled token:  2794: ' means'
[1706410573] sampled token:   372: ' it'
[1706410574] sampled token: 29915: '''
[1706410574] sampled token: 29879: 's'
[1706410574] sampled token:   263: ' a'
[1706410574] sampled token:  9493: ' detail'
[1706410574] sampled token:   393: ' that'
[1706410574] sampled token:   278: ' the'
[1706410574] sampled token:  1824: ' program'
[1706410575] sampled token: 10753: ' wants'
[1706410575] sampled token:   304: ' to'
[1706410575] sampled token: 23120: ' communicate'
[1706410575] sampled token: 29889: '.'
[1706410575] sampled token:    13: '
'
[1706410575] sampled token: 29930: '*'
[1706410575] sampled token:   421: ' `'
[1706410575] sampled token:  4993: 'source'
[1706410576] sampled token:  6998: '`:'
[1706410576] sampled token:   450: ' The'
[1706410576] sampled token:  4423: ' location'
[1706410576] sampled token:   988: ' where'
[1706410576] sampled token:   278: ' the'
[1706410576] sampled token:  2643: ' message'
[1706410576] sampled token:   471: ' was'
[1706410577] sampled token:  5759: ' generated'
[1706410577] sampled token: 29889: '.'
[1706410577] sampled token:   512: ' In'
[1706410577] sampled token:   445: ' this'
[1706410577] sampled token:  1206: ' case'
[1706410577] sampled token: 29892: ','
[1706410577] sampled token:   372: ' it'
[1706410577] sampled token: 29915: '''
[1706410578] sampled token: 29879: 's'
[1706410578] sampled token:  7034: ' `/'
[1706410578] sampled token:  1484: 'go'
[1706410578] sampled token: 29914: '/'
[1706410578] sampled token:  4351: 'src'
[1706410578] sampled token: 29914: '/'
[1706410578] sampled token:  3292: 'github'
[1706410579] sampled token: 29889: '.'
[1706410579] sampled token:   510: 'com'
[1706410579] sampled token: 29914: '/'
[1706410579] sampled token: 21231: 'jm'
[1706410579] sampled token:  6388: 'organ'
[1706410579] sampled token:  1113: 'ca'
[1706410579] sampled token: 29914: '/'
[1706410580] sampled token:  3028: 'oll'
[1706410580] sampled token:  3304: 'ama'
[1706410580] sampled token: 29914: '/'
[1706410580] sampled token: 29887: 'g'
[1706410580] sampled token:  3746: 'pu'
[1706410580] sampled token: 29914: '/'
[1706410580] sampled token: 29887: 'g'
[1706410581] sampled token:  3746: 'pu'
[1706410581] sampled token: 29889: '.'
[1706410581] sampled token:  1484: 'go'
[1706410581] sampled token:  1673: '`,'
[1706410581] sampled token:   607: ' which'
[1706410581] sampled token: 14661: ' suggests'
[1706410581] sampled token:   393: ' that'
[1706410581] sampled token:   278: ' the'
[1706410582] sampled token:  2643: ' message'
[1706410582] sampled token:   338: ' is'
[1706410582] sampled token:  4475: ' related'
[1706410582] sampled token:   304: ' to'
[1706410582] sampled token:   278: ' the'
[1706410582] sampled token: 22796: ' GPU'
[1706410582] sampled token: 15326: ' detection'
[1706410583] sampled token:   322: ' and'
[1706410583] sampled token:  5285: ' configuration'
[1706410583] sampled token: 29889: '.'
[1706410583] sampled token:    13: '
'
[1706410583] sampled token: 29930: '*'
[1706410583] sampled token:   421: ' `'
[1706410583] sampled token:  7645: 'msg'
[1706410584] sampled token:  6998: '`:'
[1706410584] sampled token:   450: ' The'
[1706410584] sampled token:  3935: ' actual'
[1706410584] sampled token:  2643: ' message'
[1706410584] sampled token:  1641: ' being'
[1706410584] sampled token: 13817: ' logged'
[1706410584] sampled token: 29892: ','
[1706410585] sampled token:   607: ' which'
[1706410585] sampled token:   338: ' is'
[1706410585] sampled token:   263: ' a'
[1706410585] sampled token: 11473: ' brief'
[1706410585] sampled token:  6139: ' description'
[1706410585] sampled token:   310: ' of'
[1706410585] sampled token:   825: ' what'
[1706410586] sampled token:   278: ' the'
[1706410586] sampled token:  1824: ' program'
[1706410586] sampled token:   756: ' has'
[1706410586] sampled token: 17809: ' detected'
[1706410586] sampled token: 29889: '.'
[1706410586] sampled token:   512: ' In'
[1706410586] sampled token:   445: ' this'
[1706410587] sampled token:  1206: ' case'
[1706410587] sampled token: 29892: ','
[1706410587] sampled token:   372: ' it'
[1706410587] sampled token: 29915: '''
[1706410587] sampled token: 29879: 's'
[1706410587] sampled token:   376: ' "'
[1706410588] sampled token: 29883: 'c'
[1706410588] sampled token:  6191: 'uda'
[1706410588] sampled token: 17809: ' detected'
[1706410588] sampled token: 29871: ' '
[1706410588] sampled token: 29896: '1'
[1706410588] sampled token:  9224: ' devices'
[1706410588] sampled token:   411: ' with'
[1706410589] sampled token: 29871: ' '
[1706410589] sampled token: 29906: '2'
[1706410589] sampled token: 29929: '9'
[1706410589] sampled token: 29900: '0'
[1706410589] sampled token: 29906: '2'
[1706410589] sampled token: 29924: 'M'
[1706410589] sampled token:  3625: ' available'
[1706410590] sampled token:  3370: ' memory'
[1706410590] sampled token:  1642: '".'
[1706410590] sampled token:   910: ' This'
[1706410590] sampled token:  2794: ' means'
[1706410590] sampled token:   393: ' that'
[1706410590] sampled token:   278: ' the'
[1706410590] sampled token:   421: ' `'
[1706410591] sampled token: 29907: 'C'
[1706410591] sampled token: 29965: 'U'
[1706410591] sampled token:  7698: 'DA'
[1706410591] sampled token: 29952: '`'
[1706410591] sampled token: 15326: ' detection'
[1706410591] sampled token:  5780: ' tool'
[1706410591] sampled token:   756: ' has'
[1706410592] sampled token: 15659: ' identified'
[1706410592] sampled token:   697: ' one'
[1706410592] sampled token: 22796: ' GPU'
[1706410592] sampled token:  4742: ' device'
[1706410592] sampled token:   373: ' on'
[1706410592] sampled token:   278: ' the'
[1706410592] sampled token:  1788: ' system'
[1706410592] sampled token:   322: ' and'
[1706410593] sampled token:  8967: ' reported'
[1706410593] sampled token:   967: ' its'
[1706410593] sampled token:  3625: ' available'
[1706410593] sampled token:  3370: ' memory'
[1706410593] sampled token: 13284: ' capacity'
[1706410593] sampled token: 29889: '.'
[1706410593] sampled token:    13: '
'
[1706410594] sampled token:    13: '
'
[1706410594] sampled token:  3563: 'Over'
[1706410594] sampled token:   497: 'all'
[1706410594] sampled token: 29892: ','
[1706410594] sampled token:   445: ' this'
[1706410594] sampled token:  2643: ' message'
[1706410594] sampled token: 14088: ' indicates'
[1706410595] sampled token:   393: ' that'
[1706410595] sampled token:   727: ' there'
[1706410595] sampled token:   338: ' is'
[1706410595] sampled token:   472: ' at'
[1706410595] sampled token:  3203: ' least'
[1706410595] sampled token:   697: ' one'
[1706410595] sampled token: 22796: ' GPU'
[1706410596] sampled token:  4742: ' device'
[1706410596] sampled token:  5130: ' installed'
[1706410596] sampled token:   373: ' on'
[1706410596] sampled token:   278: ' the'
[1706410596] sampled token:  1788: ' system'
[1706410596] sampled token:   411: ' with'
[1706410596] sampled token:   263: ' a'
[1706410597] sampled token:  3001: ' total'
[1706410597] sampled token:  3625: ' available'
[1706410597] sampled token:  3370: ' memory'
[1706410597] sampled token:   310: ' of'
[1706410597] sampled token:  2820: ' around'
[1706410597] sampled token: 29871: ' '
[1706410597] sampled token: 29906: '2'
[1706410598] sampled token: 29889: '.'
[1706410598] sampled token: 29929: '9'
[1706410598] sampled token: 19340: ' gig'
[1706410598] sampled token: 10798: 'aby'
[1706410598] sampled token:  2167: 'tes'
[1706410598] sampled token:   313: ' ('
[1706410598] sampled token: 29906: '2'
[1706410599] sampled token: 29929: '9'
[1706410599] sampled token: 29900: '0'
[1706410599] sampled token: 29906: '2'
[1706410599] sampled token:  4508: ' meg'
[1706410599] sampled token: 10798: 'aby'
[1706410599] sampled token:  2167: 'tes'
[1706410599] sampled token:   467: ').'
[1706410600] sampled token:     2: ''
[1706410600]
[1706410600] print_timings: prompt eval time =    4678.88 ms /   100 tokens (   46.79 ms per token,    21.37 tokens per second)
[1706410600] print_timings:        eval time =   49664.17 ms /   368 runs   (  134.96 ms per token,     7.41 tokens per second)
[1706410600] print_timings:       total time =   54343.05 ms
[1706410600] slot 0 released (468 tokens in cache)
[GIN] 2024/01/28 - 10:56:40 | 200 | 54.344133351s |       127.0.0.1 | POST     "/api/chat"
time=2024-01-28T10:58:03.122+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:170 msg="loaded 0 images"
[1706410683] slot 0 released (468 tokens in cache)
[1706410683] slot 0 is processing [task id: 2]
[1706410683] slot 0 : in cache: 467 tokens | to process: 23 tokens
[1706410683] slot 0 : kv cache rm - [467, end)
[1706410685] sampled token:    13: '
'
[1706410685] sampled token: 18420: 'Good'
[1706410685] sampled token:  7250: ' morning'
[1706410685] sampled token:   304: ' to'
[1706410686] sampled token:   366: ' you'
[1706410686] sampled token:   408: ' as'
[1706410686] sampled token:  1532: ' well'
[1706410686] sampled token: 29991: '!'
[1706410686] sampled token:   739: ' It'
[1706410686] sampled token: 29915: '''
[1706410686] sampled token: 29879: 's'
[1706410686] sampled token:  2337: ' always'
[1706410687] sampled token:   263: ' a'
[1706410687] sampled token: 15377: ' pleasure'
[1706410687] sampled token:   304: ' to'
[1706410687] sampled token:  1371: ' help'
[1706410687] sampled token:   411: ' with'
[1706410687] sampled token:   738: ' any'
[1706410687] sampled token:  5155: ' questions'
[1706410688] sampled token:   470: ' or'
[1706410688] sampled token: 21838: ' concerns'
[1706410688] sampled token:   366: ' you'
[1706410688] sampled token:  1122: ' may'
[1706410688] sampled token:   505: ' have'
[1706410688] sampled token: 29889: '.'
[1706410688] sampled token:  1128: ' How'
[1706410689] sampled token:   508: ' can'
[1706410689] sampled token:   306: ' I'
[1706410689] sampled token:  6985: ' assist'
[1706410689] sampled token:   366: ' you'
[1706410689] sampled token:  9826: ' today'
[1706410689] sampled token: 29973: '?'
[1706410689] sampled token:  1938: ' Do'
[1706410690] sampled token:   366: ' you'
[1706410690] sampled token:   505: ' have'
[1706410690] sampled token:   738: ' any'
[1706410690] sampled token:  2702: ' specific'
[1706410690] sampled token: 23820: ' topics'
[1706410690] sampled token:   470: ' or'
[1706410690] sampled token: 10161: ' areas'
[1706410691] sampled token:   310: ' of'
[1706410691] sampled token:  4066: ' interest'
[1706410691] sampled token:   366: ' you'
[1706410691] sampled token: 29915: '''
[1706410691] sampled token: 29881: 'd'
[1706410691] sampled token:   763: ' like'
[1706410691] sampled token:   304: ' to'
[1706410692] sampled token:  5353: ' discuss'
[1706410692] sampled token: 29973: '?'
[1706410692] sampled token:     2: ''
[1706410692]
[1706410692] print_timings: prompt eval time =    2409.06 ms /    23 tokens (  104.74 ms per token,     9.55 tokens per second)
[1706410692] print_timings:        eval time =    6855.89 ms /    50 runs   (  137.12 ms per token,     7.29 tokens per second)
[1706410692] print_timings:       total time =    9264.95 ms
[1706410692] slot 0 released (540 tokens in cache)
[GIN] 2024/01/28 - 10:58:12 | 200 |  9.265924488s |       127.0.0.1 | POST     "/api/chat"
time=2024-01-28T10:59:04.393+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:170 msg="loaded 0 images"
[1706410744] slot 0 released (540 tokens in cache)
[1706410744] slot 0 is processing [task id: 4]
[1706410744] slot 0 : in cache: 539 tokens | to process: 25 tokens
[1706410744] slot 0 : kv cache rm - [539, end)
[1706410747] sampled token:    13: '
'
[1706410747] sampled token:  9048: 'Oh'
[1706410747] sampled token:   694: ' no'
[1706410747] sampled token: 29991: '!'
[1706410747] sampled token:  8221: ' Sorry'
[1706410747] sampled token:   304: ' to'
[1706410747] sampled token:  8293: ' hear'
[1706410748] sampled token:   393: ' that'
[1706410748] sampled token:   366: ' you'
[1706410748] sampled token: 29915: '''
[1706410748] sampled token:   345: 've'
[1706410748] sampled token: 18169: ' encountered'
[1706410748] sampled token:   263: ' a'
[1706410748] sampled token:  6494: ' bug'
[1706410749] sampled token: 29889: '.'
[1706410749] sampled token:  1815: ' Can'
[1706410749] sampled token:   366: ' you'
[1706410749] sampled token:  2649: ' tell'
[1706410749] sampled token:   592: ' me'
[1706410749] sampled token:   901: ' more'
[1706410749] sampled token:  1048: ' about'
[1706410750] sampled token:   372: ' it'
[1706410750] sampled token: 29973: '?'
[1706410750] sampled token:  1724: ' What'
[1706410750] sampled token:  9559: ' happened'
[1706410750] sampled token:   746: ' when'
[1706410750] sampled token:   366: ' you'
[1706410750] sampled token:  1898: ' tried'
[1706410751] sampled token:   304: ' to'
[1706410751] sampled token:   671: ' use'
[1706410751] sampled token:   278: ' the'
[1706410751] sampled token:  4682: ' feature'
[1706410751] sampled token:   470: ' or'
[1706410751] sampled token:  6222: ' execute'
[1706410751] sampled token:   278: ' the'
[1706410752] sampled token:   775: ' code'
[1706410752] sampled token: 29973: '?'
[1706410752] sampled token:  3139: ' Any'
[1706410752] sampled token:  1059: ' error'
[1706410752] sampled token:  7191: ' messages'
[1706410752] sampled token:   470: ' or'
[1706410752] sampled token:  5096: ' stack'
[1706410753] sampled token: 26695: ' traces'
[1706410753] sampled token:   366: ' you'
[1706410753] sampled token:   508: ' can'
[1706410753] sampled token:  3867: ' provide'
[1706410753] sampled token:   723: ' would'
[1706410753] sampled token:   367: ' be'
[1706410753] sampled token:  8444: ' helpful'
[1706410754] sampled token:   297: ' in'
[1706410754] sampled token: 19912: ' helping'
[1706410754] sampled token:   592: ' me'
[1706410754] sampled token:  2274: ' understand'
[1706410754] sampled token:   278: ' the'
[1706410754] sampled token:  2228: ' issue'
[1706410754] sampled token:  2253: ' better'
[1706410755] sampled token: 29973: '?'
[1706410755] sampled token:     2: ''
[1706410755]
[1706410755] print_timings: prompt eval time =    2705.29 ms /    25 tokens (  108.21 ms per token,     9.24 tokens per second)
[1706410755] print_timings:        eval time =    8168.37 ms /    58 runs   (  140.83 ms per token,     7.10 tokens per second)
[1706410755] print_timings:       total time =   10873.66 ms
[1706410755] slot 0 released (622 tokens in cache)
[GIN] 2024/01/28 - 10:59:15 | 200 | 10.874557842s |       127.0.0.1 | POST     "/api/chat"

Below is the second:

(LLM_env) czk@DESKTOP-0JQI779:~$ ollama run llama2
pulling manifest
pulling 8934d96d3f08... 100% ▕█████████████████████████████████████████████████████████████████▏ 3.8 GB
pulling manifest
pulling 8934d96d3f08... 100% ▕█████████████████████████████████████████████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea... 100% ▕█████████████████████████████████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕█████████████████████████████████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕█████████████████████████████████████████████████████████████████▏   59 B
pulling fa304d675061... 100% ▕█████████████████████████████████████████████████████████████████▏   91 B
pulling 42ba7f8a01dd... 100% ▕█████████████████████████████████████████████████████████████████▏  557 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> time=2024-01-28T10:49:34.788+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:225 msg="cuda detected 1
...  devices with 2902M available memory"

The message you provided indicates that the `CUDA` detection has found 1 device with 2902 megabytes of available memory.
This information is being logged at the DEBUG level, which means it's an important detail that the program wants to
communicate to the user or developer.

Here's a breakdown of the message:

* `time`: The timestamp of when the message was generated, in the format `YYYY-MM-DDTHH:MM:SS.XXX+0000`. In this case, it's
January 28, 2024, at 10:49:34 AM (+08:00).
* `level`: The log level of the message, which indicates how important it is. In this case, it's set to DEBUG, which means
it's a detail that the program wants to communicate.
* `source`: The location where the message was generated. In this case, it's
`/go/src/github.com/jmorganca/ollama/gpu/gpu.go`, which suggests that the message is related to the GPU detection and
configuration.
* `msg`: The actual message being logged, which is a brief description of what the program has detected. In this case, it's
"cuda detected 1 devices with 2902M available memory". This means that the `CUDA` detection tool has identified one GPU
device on the system and reported its available memory capacity.

Overall, this message indicates that there is at least one GPU device installed on the system with a total available memory
of around 2.9 gigabytes (2902 megabytes).

>>> /set verbose
Set 'verbose' mode.
>>> Goodmoring!

Good morning to you as well! It's always a pleasure to help with any questions or concerns you may have. How can I assist
you today? Do you have any specific topics or areas of interest you'd like to discuss?

total duration:       9.265797588s
load duration:        224.3µs
prompt eval count:    23 token(s)
prompt eval duration: 2.409061s
prompt eval rate:     9.55 tokens/s
eval count:           50 token(s)
eval duration:        6.855886s
eval rate:            7.29 tokens/s
>>> ok, i encountered a bug

Oh no! Sorry to hear that you've encountered a bug. Can you tell me more about it? What happened when you tried to use the
feature or execute the code? Any error messages or stack traces you can provide would be helpful in helping me understand
the issue better?

total duration:       10.874482742s
load duration:        202.2µs
prompt eval count:    25 token(s)
prompt eval duration: 2.705293s
prompt eval rate:     9.24 tokens/s
eval count:           58 token(s)
eval duration:        8.168366s
eval rate:            7.10 tokens/s

It looks like my ollama run llama2 works fine. Is it because my memory is too small?

<!-- gh-comment-id:1913427645 --> @CaiZekun commented on GitHub (Jan 28, 2024): Thanks for your attention! I followed your instructions. Below is the first wsl window: ``` (LLM_env) czk@DESKTOP-0JQI779:~$ ollama list NAME ID SIZE MODIFIED llama2:latest 78e26419b446 3.8 GB 13 hours ago (LLM_env) czk@DESKTOP-0JQI779:~$ sudo systemctl stop ollama [sudo] password for czk: (LLM_env) czk@DESKTOP-0JQI779:~$ OLLAMA_DEBUG=1 ollama serve time=2024-01-28T10:49:30.912+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/server/routes.go:926 msg="Debug logging enabled" time=2024-01-28T10:49:30.913+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/images.go:857 msg="total blobs: 0" time=2024-01-28T10:49:30.913+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/images.go:864 msg="total unused blobs removed: 0" time=2024-01-28T10:49:30.913+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/routes.go:950 msg="Listening on 127.0.0.1:11434 (version 0.1.22)" time=2024-01-28T10:49:30.914+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-01-28T10:49:33.206+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/payload_common.go:145 msg="Dynamic LLM libraries [rocm_v6 cpu cpu_avx2 cpu_avx cuda_v11 rocm_v5]" time=2024-01-28T10:49:33.206+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/llm/payload_common.go:146 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-01-28T10:49:33.206+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:94 msg="Detecting GPU type" time=2024-01-28T10:49:33.206+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:236 msg="Searching for GPU management library libnvidia-ml.so" time=2024-01-28T10:49:33.206+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:254 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/local/cuda-11.7/lib64/libnvidia-ml.so*]" time=2024-01-28T10:49:34.745+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:282 msg="Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.so.1 /usr/lib/wsl/drivers/nvlt.inf_amd64_7947c31fc944635c/libnvidia-ml.so.1]" wiring nvidia management library functions in /usr/lib/wsl/lib/libnvidia-ml.so.1 dlsym: nvmlInit_v2 dlsym: nvmlShutdown dlsym: nvmlDeviceGetHandleByIndex dlsym: nvmlDeviceGetMemoryInfo dlsym: nvmlDeviceGetCount_v2 dlsym: nvmlDeviceGetCudaComputeCapability dlsym: nvmlSystemGetDriverVersion dlsym: nvmlDeviceGetName dlsym: nvmlDeviceGetSerial dlsym: nvmlDeviceGetVbiosVersion dlsym: nvmlDeviceGetBoardPartNumber dlsym: nvmlDeviceGetBrand CUDA driver version: 517.00 time=2024-01-28T10:49:34.777+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:99 msg="Nvidia GPU detected" [0] CUDA device name: NVIDIA GeForce GTX 1650 Ti [0] CUDA part number: nvmlDeviceGetSerial failed: 3 [0] CUDA vbios version: 90.17.42.00.49 [0] CUDA brand: 5 [0] CUDA totalMem 4294967296 [0] CUDA usedMem 4117594112 time=2024-01-28T10:49:34.788+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:140 msg="CUDA Compute Capability detected: 7.5" time=2024-01-28T10:49:34.788+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:225 msg="cuda detected 1 devices with 2902M available memory" [GIN] 2024/01/28 - 10:51:19 | 200 | 24.5��s | 127.0.0.1 | HEAD "/" [GIN] 2024/01/28 - 10:51:19 | 404 | 172.9µs | 127.0.0.1 | POST "/api/show" time=2024-01-28T10:51:37.632+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 8934d96d3f08 in 39 100 MB part(s)" time=2024-01-28T10:52:31.365+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:162 msg="8934d96d3f08 part 5 attempt 0 failed: unexpected EOF, retrying in 1s" time=2024-01-28T10:53:55.721+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 8c17c2ebb0ea in 1 7.0 KB part(s)" time=2024-01-28T10:54:15.629+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 7c23fb36d801 in 1 4.8 KB part(s)" time=2024-01-28T10:54:35.674+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 2e0493f67d0c in 1 59 B part(s)" time=2024-01-28T10:54:55.608+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading fa304d675061 in 1 91 B part(s)" time=2024-01-28T10:55:15.976+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/server/download.go:123 msg="downloading 42ba7f8a01dd in 1 557 B part(s)" [GIN] 2024/01/28 - 10:55:35 | 200 | 4m16s | 127.0.0.1 | POST "/api/pull" [GIN] 2024/01/28 - 10:55:35 | 200 | 377.9µs | 127.0.0.1 | POST "/api/show" [0] CUDA device name: NVIDIA GeForce GTX 1650 Ti [0] CUDA part number: nvmlDeviceGetSerial failed: 3 [0] CUDA vbios version: 90.17.42.00.49 [0] CUDA brand: 5 [0] CUDA totalMem 4294967296 [0] CUDA usedMem 4117594112 time=2024-01-28T10:55:35.431+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:140 msg="CUDA Compute Capability detected: 7.5" time=2024-01-28T10:55:35.431+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:225 msg="cuda detected 1 devices with 2902M available memory" [0] CUDA device name: NVIDIA GeForce GTX 1650 Ti [0] CUDA part number: nvmlDeviceGetSerial failed: 3 [0] CUDA vbios version: 90.17.42.00.49 [0] CUDA brand: 5 [0] CUDA totalMem 4294967296 [0] CUDA usedMem 4117594112 time=2024-01-28T10:55:35.431+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:140 msg="CUDA Compute Capability detected: 7.5" time=2024-01-28T10:55:35.431+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/cpu_common.go:11 msg="CPU has AVX2" loading library /tmp/ollama1176188984/cuda_v11/libext_server.so time=2024-01-28T10:55:35.438+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1176188984/cuda_v11/libext_server.so" time=2024-01-28T10:55:35.438+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:145 msg="Initializing llama server" [1706410535] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | [1706410535] Performing pre-initialization of GPU ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 1650 Ti, compute capability 7.5, VMM: yes llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from /home/czk/.ollama/models/blobs/sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = LLaMA v2 llama_model_loader: - kv 2: llama.context_length u32 = 4096 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 21: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'... llama_model_loader: - kv 22: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 6.74 B llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) llm_load_print_meta: general.name = LLaMA v2 llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.22 MiB llm_load_tensors: offloading 19 repeating layers to GPU llm_load_tensors: offloaded 19/33 layers to GPU llm_load_tensors: CPU buffer size = 3647.87 MiB llm_load_tensors: CUDA0 buffer size = 2063.29 MiB .................................................................................................. llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA_Host KV buffer size = 416.00 MiB llama_kv_cache_init: CUDA0 KV buffer size = 608.00 MiB llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_new_context_with_model: CUDA_Host input buffer size = 12.01 MiB llama_new_context_with_model: CUDA0 compute buffer size = 156.00 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 152.00 MiB llama_new_context_with_model: graph splits (measure): 5 [1706410537] warming up the model with an empty run [1706410537] Available slots: [1706410537] -> Slot 0 - max context: 2048 time=2024-01-28T10:55:37.689+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:156 msg="Starting llama main loop" [1706410537] llama server main loop starting [1706410537] all slots are idle and system prompt is empty, clear the KV cache [GIN] 2024/01/28 - 10:55:37 | 200 | 2.386657505s | 127.0.0.1 | POST "/api/chat" time=2024-01-28T10:55:45.691+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:170 msg="loaded 0 images" [1706410545] slot 0 is processing [task id: 0] [1706410545] slot 0 : in cache: 0 tokens | to process: 100 tokens [1706410545] slot 0 : kv cache rm - [0, end) [1706410550] sampled token: 13: ' ' [1706410550] sampled token: 1576: 'The' [1706410550] sampled token: 2643: ' message' [1706410550] sampled token: 366: ' you' [1706410550] sampled token: 4944: ' provided' [1706410550] sampled token: 14088: ' indicates' [1706410551] sampled token: 393: ' that' [1706410551] sampled token: 278: ' the' [1706410551] sampled token: 421: ' `' [1706410551] sampled token: 29907: 'C' [1706410551] sampled token: 29965: 'U' [1706410551] sampled token: 7698: 'DA' [1706410551] sampled token: 29952: '`' [1706410551] sampled token: 15326: ' detection' [1706410551] sampled token: 756: ' has' [1706410551] sampled token: 1476: ' found' [1706410552] sampled token: 29871: ' ' [1706410552] sampled token: 29896: '1' [1706410552] sampled token: 4742: ' device' [1706410552] sampled token: 411: ' with' [1706410552] sampled token: 29871: ' ' [1706410552] sampled token: 29906: '2' [1706410552] sampled token: 29929: '9' [1706410552] sampled token: 29900: '0' [1706410553] sampled token: 29906: '2' [1706410553] sampled token: 4508: ' meg' [1706410553] sampled token: 10798: 'aby' [1706410553] sampled token: 2167: 'tes' [1706410553] sampled token: 310: ' of' [1706410553] sampled token: 3625: ' available' [1706410553] sampled token: 3370: ' memory' [1706410553] sampled token: 29889: '.' [1706410554] sampled token: 910: ' This' [1706410554] sampled token: 2472: ' information' [1706410554] sampled token: 338: ' is' [1706410554] sampled token: 1641: ' being' [1706410554] sampled token: 13817: ' logged' [1706410554] sampled token: 472: ' at' [1706410554] sampled token: 278: ' the' [1706410555] sampled token: 21681: ' DEBUG' [1706410555] sampled token: 3233: ' level' [1706410555] sampled token: 29892: ',' [1706410555] sampled token: 607: ' which' [1706410555] sampled token: 2794: ' means' [1706410555] sampled token: 372: ' it' [1706410555] sampled token: 29915: ''' [1706410555] sampled token: 29879: 's' [1706410556] sampled token: 385: ' an' [1706410556] sampled token: 4100: ' important' [1706410556] sampled token: 9493: ' detail' [1706410556] sampled token: 393: ' that' [1706410556] sampled token: 278: ' the' [1706410556] sampled token: 1824: ' program' [1706410556] sampled token: 10753: ' wants' [1706410556] sampled token: 304: ' to' [1706410557] sampled token: 23120: ' communicate' [1706410557] sampled token: 304: ' to' [1706410557] sampled token: 278: ' the' [1706410557] sampled token: 1404: ' user' [1706410557] sampled token: 470: ' or' [1706410557] sampled token: 13897: ' developer' [1706410557] sampled token: 29889: '.' [1706410558] sampled token: 13: ' ' [1706410558] sampled token: 13: ' ' [1706410558] sampled token: 10605: 'Here' [1706410558] sampled token: 29915: ''' [1706410558] sampled token: 29879: 's' [1706410558] sampled token: 263: ' a' [1706410558] sampled token: 2867: ' break' [1706410558] sampled token: 3204: 'down' [1706410559] sampled token: 310: ' of' [1706410559] sampled token: 278: ' the' [1706410559] sampled token: 2643: ' message' [1706410559] sampled token: 29901: ':' [1706410559] sampled token: 13: ' ' [1706410559] sampled token: 13: ' ' [1706410559] sampled token: 29930: '*' [1706410560] sampled token: 421: ' `' [1706410560] sampled token: 2230: 'time' [1706410560] sampled token: 6998: '`:' [1706410560] sampled token: 450: ' The' [1706410560] sampled token: 14334: ' timestamp' [1706410560] sampled token: 310: ' of' [1706410560] sampled token: 746: ' when' [1706410560] sampled token: 278: ' the' [1706410561] sampled token: 2643: ' message' [1706410561] sampled token: 471: ' was' [1706410561] sampled token: 5759: ' generated' [1706410561] sampled token: 29892: ',' [1706410561] sampled token: 297: ' in' [1706410561] sampled token: 278: ' the' [1706410561] sampled token: 3402: ' format' [1706410561] sampled token: 421: ' `' [1706410562] sampled token: 14995: 'YY' [1706410562] sampled token: 14995: 'YY' [1706410562] sampled token: 29899: '-' [1706410562] sampled token: 7428: 'MM' [1706410562] sampled token: 29899: '-' [1706410562] sampled token: 7858: 'DD' [1706410562] sampled token: 4690: 'TH' [1706410563] sampled token: 29950: 'H' [1706410563] sampled token: 29901: ':' [1706410563] sampled token: 7428: 'MM' [1706410563] sampled token: 29901: ':' [1706410563] sampled token: 1799: 'SS' [1706410563] sampled token: 29889: '.' [1706410563] sampled token: 22791: 'XXX' [1706410563] sampled token: 29974: '+' [1706410564] sampled token: 29900: '0' [1706410564] sampled token: 29900: '0' [1706410564] sampled token: 29900: '0' [1706410564] sampled token: 29900: '0' [1706410564] sampled token: 1412: '`.' [1706410564] sampled token: 512: ' In' [1706410564] sampled token: 445: ' this' [1706410565] sampled token: 1206: ' case' [1706410565] sampled token: 29892: ',' [1706410565] sampled token: 372: ' it' [1706410565] sampled token: 29915: ''' [1706410565] sampled token: 29879: 's' [1706410565] sampled token: 5490: ' January' [1706410565] sampled token: 29871: ' ' [1706410565] sampled token: 29906: '2' [1706410566] sampled token: 29947: '8' [1706410566] sampled token: 29892: ',' [1706410566] sampled token: 29871: ' ' [1706410566] sampled token: 29906: '2' [1706410566] sampled token: 29900: '0' [1706410566] sampled token: 29906: '2' [1706410566] sampled token: 29946: '4' [1706410567] sampled token: 29892: ',' [1706410567] sampled token: 472: ' at' [1706410567] sampled token: 29871: ' ' [1706410567] sampled token: 29896: '1' [1706410567] sampled token: 29900: '0' [1706410567] sampled token: 29901: ':' [1706410567] sampled token: 29946: '4' [1706410567] sampled token: 29929: '9' [1706410568] sampled token: 29901: ':' [1706410568] sampled token: 29941: '3' [1706410568] sampled token: 29946: '4' [1706410568] sampled token: 13862: ' AM' [1706410568] sampled token: 20532: ' (+' [1706410568] sampled token: 29900: '0' [1706410568] sampled token: 29947: '8' [1706410569] sampled token: 29901: ':' [1706410569] sampled token: 29900: '0' [1706410569] sampled token: 29900: '0' [1706410569] sampled token: 467: ').' [1706410569] sampled token: 13: ' ' [1706410569] sampled token: 29930: '*' [1706410569] sampled token: 421: ' `' [1706410570] sampled token: 5563: 'level' [1706410570] sampled token: 6998: '`:' [1706410570] sampled token: 450: ' The' [1706410570] sampled token: 1480: ' log' [1706410570] sampled token: 3233: ' level' [1706410570] sampled token: 310: ' of' [1706410570] sampled token: 278: ' the' [1706410570] sampled token: 2643: ' message' [1706410571] sampled token: 29892: ',' [1706410571] sampled token: 607: ' which' [1706410571] sampled token: 14088: ' indicates' [1706410571] sampled token: 920: ' how' [1706410571] sampled token: 4100: ' important' [1706410571] sampled token: 372: ' it' [1706410571] sampled token: 338: ' is' [1706410572] sampled token: 29889: '.' [1706410572] sampled token: 512: ' In' [1706410572] sampled token: 445: ' this' [1706410572] sampled token: 1206: ' case' [1706410572] sampled token: 29892: ',' [1706410572] sampled token: 372: ' it' [1706410572] sampled token: 29915: ''' [1706410572] sampled token: 29879: 's' [1706410573] sampled token: 731: ' set' [1706410573] sampled token: 304: ' to' [1706410573] sampled token: 21681: ' DEBUG' [1706410573] sampled token: 29892: ',' [1706410573] sampled token: 607: ' which' [1706410573] sampled token: 2794: ' means' [1706410573] sampled token: 372: ' it' [1706410574] sampled token: 29915: ''' [1706410574] sampled token: 29879: 's' [1706410574] sampled token: 263: ' a' [1706410574] sampled token: 9493: ' detail' [1706410574] sampled token: 393: ' that' [1706410574] sampled token: 278: ' the' [1706410574] sampled token: 1824: ' program' [1706410575] sampled token: 10753: ' wants' [1706410575] sampled token: 304: ' to' [1706410575] sampled token: 23120: ' communicate' [1706410575] sampled token: 29889: '.' [1706410575] sampled token: 13: ' ' [1706410575] sampled token: 29930: '*' [1706410575] sampled token: 421: ' `' [1706410575] sampled token: 4993: 'source' [1706410576] sampled token: 6998: '`:' [1706410576] sampled token: 450: ' The' [1706410576] sampled token: 4423: ' location' [1706410576] sampled token: 988: ' where' [1706410576] sampled token: 278: ' the' [1706410576] sampled token: 2643: ' message' [1706410576] sampled token: 471: ' was' [1706410577] sampled token: 5759: ' generated' [1706410577] sampled token: 29889: '.' [1706410577] sampled token: 512: ' In' [1706410577] sampled token: 445: ' this' [1706410577] sampled token: 1206: ' case' [1706410577] sampled token: 29892: ',' [1706410577] sampled token: 372: ' it' [1706410577] sampled token: 29915: ''' [1706410578] sampled token: 29879: 's' [1706410578] sampled token: 7034: ' `/' [1706410578] sampled token: 1484: 'go' [1706410578] sampled token: 29914: '/' [1706410578] sampled token: 4351: 'src' [1706410578] sampled token: 29914: '/' [1706410578] sampled token: 3292: 'github' [1706410579] sampled token: 29889: '.' [1706410579] sampled token: 510: 'com' [1706410579] sampled token: 29914: '/' [1706410579] sampled token: 21231: 'jm' [1706410579] sampled token: 6388: 'organ' [1706410579] sampled token: 1113: 'ca' [1706410579] sampled token: 29914: '/' [1706410580] sampled token: 3028: 'oll' [1706410580] sampled token: 3304: 'ama' [1706410580] sampled token: 29914: '/' [1706410580] sampled token: 29887: 'g' [1706410580] sampled token: 3746: 'pu' [1706410580] sampled token: 29914: '/' [1706410580] sampled token: 29887: 'g' [1706410581] sampled token: 3746: 'pu' [1706410581] sampled token: 29889: '.' [1706410581] sampled token: 1484: 'go' [1706410581] sampled token: 1673: '`,' [1706410581] sampled token: 607: ' which' [1706410581] sampled token: 14661: ' suggests' [1706410581] sampled token: 393: ' that' [1706410581] sampled token: 278: ' the' [1706410582] sampled token: 2643: ' message' [1706410582] sampled token: 338: ' is' [1706410582] sampled token: 4475: ' related' [1706410582] sampled token: 304: ' to' [1706410582] sampled token: 278: ' the' [1706410582] sampled token: 22796: ' GPU' [1706410582] sampled token: 15326: ' detection' [1706410583] sampled token: 322: ' and' [1706410583] sampled token: 5285: ' configuration' [1706410583] sampled token: 29889: '.' [1706410583] sampled token: 13: ' ' [1706410583] sampled token: 29930: '*' [1706410583] sampled token: 421: ' `' [1706410583] sampled token: 7645: 'msg' [1706410584] sampled token: 6998: '`:' [1706410584] sampled token: 450: ' The' [1706410584] sampled token: 3935: ' actual' [1706410584] sampled token: 2643: ' message' [1706410584] sampled token: 1641: ' being' [1706410584] sampled token: 13817: ' logged' [1706410584] sampled token: 29892: ',' [1706410585] sampled token: 607: ' which' [1706410585] sampled token: 338: ' is' [1706410585] sampled token: 263: ' a' [1706410585] sampled token: 11473: ' brief' [1706410585] sampled token: 6139: ' description' [1706410585] sampled token: 310: ' of' [1706410585] sampled token: 825: ' what' [1706410586] sampled token: 278: ' the' [1706410586] sampled token: 1824: ' program' [1706410586] sampled token: 756: ' has' [1706410586] sampled token: 17809: ' detected' [1706410586] sampled token: 29889: '.' [1706410586] sampled token: 512: ' In' [1706410586] sampled token: 445: ' this' [1706410587] sampled token: 1206: ' case' [1706410587] sampled token: 29892: ',' [1706410587] sampled token: 372: ' it' [1706410587] sampled token: 29915: ''' [1706410587] sampled token: 29879: 's' [1706410587] sampled token: 376: ' "' [1706410588] sampled token: 29883: 'c' [1706410588] sampled token: 6191: 'uda' [1706410588] sampled token: 17809: ' detected' [1706410588] sampled token: 29871: ' ' [1706410588] sampled token: 29896: '1' [1706410588] sampled token: 9224: ' devices' [1706410588] sampled token: 411: ' with' [1706410589] sampled token: 29871: ' ' [1706410589] sampled token: 29906: '2' [1706410589] sampled token: 29929: '9' [1706410589] sampled token: 29900: '0' [1706410589] sampled token: 29906: '2' [1706410589] sampled token: 29924: 'M' [1706410589] sampled token: 3625: ' available' [1706410590] sampled token: 3370: ' memory' [1706410590] sampled token: 1642: '".' [1706410590] sampled token: 910: ' This' [1706410590] sampled token: 2794: ' means' [1706410590] sampled token: 393: ' that' [1706410590] sampled token: 278: ' the' [1706410590] sampled token: 421: ' `' [1706410591] sampled token: 29907: 'C' [1706410591] sampled token: 29965: 'U' [1706410591] sampled token: 7698: 'DA' [1706410591] sampled token: 29952: '`' [1706410591] sampled token: 15326: ' detection' [1706410591] sampled token: 5780: ' tool' [1706410591] sampled token: 756: ' has' [1706410592] sampled token: 15659: ' identified' [1706410592] sampled token: 697: ' one' [1706410592] sampled token: 22796: ' GPU' [1706410592] sampled token: 4742: ' device' [1706410592] sampled token: 373: ' on' [1706410592] sampled token: 278: ' the' [1706410592] sampled token: 1788: ' system' [1706410592] sampled token: 322: ' and' [1706410593] sampled token: 8967: ' reported' [1706410593] sampled token: 967: ' its' [1706410593] sampled token: 3625: ' available' [1706410593] sampled token: 3370: ' memory' [1706410593] sampled token: 13284: ' capacity' [1706410593] sampled token: 29889: '.' [1706410593] sampled token: 13: ' ' [1706410594] sampled token: 13: ' ' [1706410594] sampled token: 3563: 'Over' [1706410594] sampled token: 497: 'all' [1706410594] sampled token: 29892: ',' [1706410594] sampled token: 445: ' this' [1706410594] sampled token: 2643: ' message' [1706410594] sampled token: 14088: ' indicates' [1706410595] sampled token: 393: ' that' [1706410595] sampled token: 727: ' there' [1706410595] sampled token: 338: ' is' [1706410595] sampled token: 472: ' at' [1706410595] sampled token: 3203: ' least' [1706410595] sampled token: 697: ' one' [1706410595] sampled token: 22796: ' GPU' [1706410596] sampled token: 4742: ' device' [1706410596] sampled token: 5130: ' installed' [1706410596] sampled token: 373: ' on' [1706410596] sampled token: 278: ' the' [1706410596] sampled token: 1788: ' system' [1706410596] sampled token: 411: ' with' [1706410596] sampled token: 263: ' a' [1706410597] sampled token: 3001: ' total' [1706410597] sampled token: 3625: ' available' [1706410597] sampled token: 3370: ' memory' [1706410597] sampled token: 310: ' of' [1706410597] sampled token: 2820: ' around' [1706410597] sampled token: 29871: ' ' [1706410597] sampled token: 29906: '2' [1706410598] sampled token: 29889: '.' [1706410598] sampled token: 29929: '9' [1706410598] sampled token: 19340: ' gig' [1706410598] sampled token: 10798: 'aby' [1706410598] sampled token: 2167: 'tes' [1706410598] sampled token: 313: ' (' [1706410598] sampled token: 29906: '2' [1706410599] sampled token: 29929: '9' [1706410599] sampled token: 29900: '0' [1706410599] sampled token: 29906: '2' [1706410599] sampled token: 4508: ' meg' [1706410599] sampled token: 10798: 'aby' [1706410599] sampled token: 2167: 'tes' [1706410599] sampled token: 467: ').' [1706410600] sampled token: 2: '' [1706410600] [1706410600] print_timings: prompt eval time = 4678.88 ms / 100 tokens ( 46.79 ms per token, 21.37 tokens per second) [1706410600] print_timings: eval time = 49664.17 ms / 368 runs ( 134.96 ms per token, 7.41 tokens per second) [1706410600] print_timings: total time = 54343.05 ms [1706410600] slot 0 released (468 tokens in cache) [GIN] 2024/01/28 - 10:56:40 | 200 | 54.344133351s | 127.0.0.1 | POST "/api/chat" time=2024-01-28T10:58:03.122+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:170 msg="loaded 0 images" [1706410683] slot 0 released (468 tokens in cache) [1706410683] slot 0 is processing [task id: 2] [1706410683] slot 0 : in cache: 467 tokens | to process: 23 tokens [1706410683] slot 0 : kv cache rm - [467, end) [1706410685] sampled token: 13: ' ' [1706410685] sampled token: 18420: 'Good' [1706410685] sampled token: 7250: ' morning' [1706410685] sampled token: 304: ' to' [1706410686] sampled token: 366: ' you' [1706410686] sampled token: 408: ' as' [1706410686] sampled token: 1532: ' well' [1706410686] sampled token: 29991: '!' [1706410686] sampled token: 739: ' It' [1706410686] sampled token: 29915: ''' [1706410686] sampled token: 29879: 's' [1706410686] sampled token: 2337: ' always' [1706410687] sampled token: 263: ' a' [1706410687] sampled token: 15377: ' pleasure' [1706410687] sampled token: 304: ' to' [1706410687] sampled token: 1371: ' help' [1706410687] sampled token: 411: ' with' [1706410687] sampled token: 738: ' any' [1706410687] sampled token: 5155: ' questions' [1706410688] sampled token: 470: ' or' [1706410688] sampled token: 21838: ' concerns' [1706410688] sampled token: 366: ' you' [1706410688] sampled token: 1122: ' may' [1706410688] sampled token: 505: ' have' [1706410688] sampled token: 29889: '.' [1706410688] sampled token: 1128: ' How' [1706410689] sampled token: 508: ' can' [1706410689] sampled token: 306: ' I' [1706410689] sampled token: 6985: ' assist' [1706410689] sampled token: 366: ' you' [1706410689] sampled token: 9826: ' today' [1706410689] sampled token: 29973: '?' [1706410689] sampled token: 1938: ' Do' [1706410690] sampled token: 366: ' you' [1706410690] sampled token: 505: ' have' [1706410690] sampled token: 738: ' any' [1706410690] sampled token: 2702: ' specific' [1706410690] sampled token: 23820: ' topics' [1706410690] sampled token: 470: ' or' [1706410690] sampled token: 10161: ' areas' [1706410691] sampled token: 310: ' of' [1706410691] sampled token: 4066: ' interest' [1706410691] sampled token: 366: ' you' [1706410691] sampled token: 29915: ''' [1706410691] sampled token: 29881: 'd' [1706410691] sampled token: 763: ' like' [1706410691] sampled token: 304: ' to' [1706410692] sampled token: 5353: ' discuss' [1706410692] sampled token: 29973: '?' [1706410692] sampled token: 2: '' [1706410692] [1706410692] print_timings: prompt eval time = 2409.06 ms / 23 tokens ( 104.74 ms per token, 9.55 tokens per second) [1706410692] print_timings: eval time = 6855.89 ms / 50 runs ( 137.12 ms per token, 7.29 tokens per second) [1706410692] print_timings: total time = 9264.95 ms [1706410692] slot 0 released (540 tokens in cache) [GIN] 2024/01/28 - 10:58:12 | 200 | 9.265924488s | 127.0.0.1 | POST "/api/chat" time=2024-01-28T10:59:04.393+08:00 level=INFO source=/go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:170 msg="loaded 0 images" [1706410744] slot 0 released (540 tokens in cache) [1706410744] slot 0 is processing [task id: 4] [1706410744] slot 0 : in cache: 539 tokens | to process: 25 tokens [1706410744] slot 0 : kv cache rm - [539, end) [1706410747] sampled token: 13: ' ' [1706410747] sampled token: 9048: 'Oh' [1706410747] sampled token: 694: ' no' [1706410747] sampled token: 29991: '!' [1706410747] sampled token: 8221: ' Sorry' [1706410747] sampled token: 304: ' to' [1706410747] sampled token: 8293: ' hear' [1706410748] sampled token: 393: ' that' [1706410748] sampled token: 366: ' you' [1706410748] sampled token: 29915: ''' [1706410748] sampled token: 345: 've' [1706410748] sampled token: 18169: ' encountered' [1706410748] sampled token: 263: ' a' [1706410748] sampled token: 6494: ' bug' [1706410749] sampled token: 29889: '.' [1706410749] sampled token: 1815: ' Can' [1706410749] sampled token: 366: ' you' [1706410749] sampled token: 2649: ' tell' [1706410749] sampled token: 592: ' me' [1706410749] sampled token: 901: ' more' [1706410749] sampled token: 1048: ' about' [1706410750] sampled token: 372: ' it' [1706410750] sampled token: 29973: '?' [1706410750] sampled token: 1724: ' What' [1706410750] sampled token: 9559: ' happened' [1706410750] sampled token: 746: ' when' [1706410750] sampled token: 366: ' you' [1706410750] sampled token: 1898: ' tried' [1706410751] sampled token: 304: ' to' [1706410751] sampled token: 671: ' use' [1706410751] sampled token: 278: ' the' [1706410751] sampled token: 4682: ' feature' [1706410751] sampled token: 470: ' or' [1706410751] sampled token: 6222: ' execute' [1706410751] sampled token: 278: ' the' [1706410752] sampled token: 775: ' code' [1706410752] sampled token: 29973: '?' [1706410752] sampled token: 3139: ' Any' [1706410752] sampled token: 1059: ' error' [1706410752] sampled token: 7191: ' messages' [1706410752] sampled token: 470: ' or' [1706410752] sampled token: 5096: ' stack' [1706410753] sampled token: 26695: ' traces' [1706410753] sampled token: 366: ' you' [1706410753] sampled token: 508: ' can' [1706410753] sampled token: 3867: ' provide' [1706410753] sampled token: 723: ' would' [1706410753] sampled token: 367: ' be' [1706410753] sampled token: 8444: ' helpful' [1706410754] sampled token: 297: ' in' [1706410754] sampled token: 19912: ' helping' [1706410754] sampled token: 592: ' me' [1706410754] sampled token: 2274: ' understand' [1706410754] sampled token: 278: ' the' [1706410754] sampled token: 2228: ' issue' [1706410754] sampled token: 2253: ' better' [1706410755] sampled token: 29973: '?' [1706410755] sampled token: 2: '' [1706410755] [1706410755] print_timings: prompt eval time = 2705.29 ms / 25 tokens ( 108.21 ms per token, 9.24 tokens per second) [1706410755] print_timings: eval time = 8168.37 ms / 58 runs ( 140.83 ms per token, 7.10 tokens per second) [1706410755] print_timings: total time = 10873.66 ms [1706410755] slot 0 released (622 tokens in cache) [GIN] 2024/01/28 - 10:59:15 | 200 | 10.874557842s | 127.0.0.1 | POST "/api/chat" ``` Below is the second: ``` (LLM_env) czk@DESKTOP-0JQI779:~$ ollama run llama2 pulling manifest pulling 8934d96d3f08... 100% ▕█████████████████████████████████████████████████████████████████▏ 3.8 GB pulling manifest pulling 8934d96d3f08... 100% ▕█████████████████████████████████████████████████████████████████▏ 3.8 GB pulling 8c17c2ebb0ea... 100% ▕█████████████████████████████████████████████████████████████████▏ 7.0 KB pulling 7c23fb36d801... 100% ▕█████████████████████████████████████████████████████████████████▏ 4.8 KB pulling 2e0493f67d0c... 100% ▕█████████████████████████████████████████████████████████████████▏ 59 B pulling fa304d675061... 100% ▕█████████████████████████████████████████████████████████████████▏ 91 B pulling 42ba7f8a01dd... 100% ▕█████████████████████████████████████████████████████████████████▏ 557 B verifying sha256 digest writing manifest removing any unused layers success >>> time=2024-01-28T10:49:34.788+08:00 level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:225 msg="cuda detected 1 ... devices with 2902M available memory" The message you provided indicates that the `CUDA` detection has found 1 device with 2902 megabytes of available memory. This information is being logged at the DEBUG level, which means it's an important detail that the program wants to communicate to the user or developer. Here's a breakdown of the message: * `time`: The timestamp of when the message was generated, in the format `YYYY-MM-DDTHH:MM:SS.XXX+0000`. In this case, it's January 28, 2024, at 10:49:34 AM (+08:00). * `level`: The log level of the message, which indicates how important it is. In this case, it's set to DEBUG, which means it's a detail that the program wants to communicate. * `source`: The location where the message was generated. In this case, it's `/go/src/github.com/jmorganca/ollama/gpu/gpu.go`, which suggests that the message is related to the GPU detection and configuration. * `msg`: The actual message being logged, which is a brief description of what the program has detected. In this case, it's "cuda detected 1 devices with 2902M available memory". This means that the `CUDA` detection tool has identified one GPU device on the system and reported its available memory capacity. Overall, this message indicates that there is at least one GPU device installed on the system with a total available memory of around 2.9 gigabytes (2902 megabytes). >>> /set verbose Set 'verbose' mode. >>> Goodmoring! Good morning to you as well! It's always a pleasure to help with any questions or concerns you may have. How can I assist you today? Do you have any specific topics or areas of interest you'd like to discuss? total duration: 9.265797588s load duration: 224.3µs prompt eval count: 23 token(s) prompt eval duration: 2.409061s prompt eval rate: 9.55 tokens/s eval count: 50 token(s) eval duration: 6.855886s eval rate: 7.29 tokens/s >>> ok, i encountered a bug Oh no! Sorry to hear that you've encountered a bug. Can you tell me more about it? What happened when you tried to use the feature or execute the code? Any error messages or stack traces you can provide would be helpful in helping me understand the issue better? total duration: 10.874482742s load duration: 202.2µs prompt eval count: 25 token(s) prompt eval duration: 2.705293s prompt eval rate: 9.24 tokens/s eval count: 58 token(s) eval duration: 8.168366s eval rate: 7.10 tokens/s ``` It looks like my `ollama run llama2` works fine. Is it because my memory is too small?
Author
Owner

@ryukyi commented on GitHub (Jan 28, 2024):

@ryukyi @akhercha @aseedb you hit an out-of-memory error on your CUDA card. We've been making steady improvements on our memory estimates, so I'd encourage you all to give 0.1.22 a try and let us know if you still see the crashes.

I reinstalled and everything works fine for mistral thanks @dhiltgen

<!-- gh-comment-id:1913531095 --> @ryukyi commented on GitHub (Jan 28, 2024): > @ryukyi @akhercha @aseedb you hit an out-of-memory error on your CUDA card. We've been making steady improvements on our memory estimates, so I'd encourage you all to give 0.1.22 a try and let us know if you still see the crashes. I reinstalled and everything works fine for mistral thanks @dhiltgen
Author
Owner

@akhercha commented on GitHub (Jan 28, 2024):

Working for me too - thanks 🫶

<!-- gh-comment-id:1913531284 --> @akhercha commented on GitHub (Jan 28, 2024): Working for me too - thanks 🫶
Author
Owner

@dhiltgen commented on GitHub (Jan 28, 2024):

@CaiZekun your output looks good! Yes, it seems to be working properly. In particular, offloaded 19/33 layers to GPU in the log shows almost half of the model is loaded on the CPU, so slower performance is to be expected. Using a smaller model that entirely or mostly fits on your GPU's VRAM will yield much better performance.

It sounds like most people on this issue now have a working setup with the latest release. @joesalvati68 if you're still having problems with 0.1.22 please add a comment and I'll re-open the issue and we'll work through it with you.

<!-- gh-comment-id:1913705676 --> @dhiltgen commented on GitHub (Jan 28, 2024): @CaiZekun your output looks good! Yes, it seems to be working properly. In particular, `offloaded 19/33 layers to GPU` in the log shows almost half of the model is loaded on the CPU, so slower performance is to be expected. Using a smaller model that entirely or mostly fits on your GPU's VRAM will yield much better performance. It sounds like most people on this issue now have a working setup with the latest release. @joesalvati68 if you're still having problems with 0.1.22 please add a comment and I'll re-open the issue and we'll work through it with you.
Author
Owner

@vsndev3 commented on GitHub (Feb 11, 2024):

For anyone getting the EOF error when using AMD 8700G iGPU with Ubuntu, below will help to solve:

The error got in the log was "rocBLAS warning: No paths matched /opt/rocm/lib/rocblas/library/gfx1103co. Make sure that ROCBLAS_TENSILE_LIBPATH is set correctly."

To fix we have to override the GFX environment variable like "HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/local/bin/ollama serve" Same can be added in /etc/systemd/system/ollama.service as a new line in [Service] section with Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0" will solve the crash.

<!-- gh-comment-id:1937514317 --> @vsndev3 commented on GitHub (Feb 11, 2024): For anyone getting the EOF error when using AMD 8700G iGPU with Ubuntu, below will help to solve: The error got in the log was _"rocBLAS warning: No paths matched /opt/rocm/lib/rocblas/library/*gfx1103*co. Make sure that ROCBLAS_TENSILE_LIBPATH is set correctly."_ To fix we have to override the GFX environment variable like `"HSA_OVERRIDE_GFX_VERSION=11.0.0 /usr/local/bin/ollama serve" `Same can be added in _/etc/systemd/system/ollama.service_ as a new line in [Service] section with `Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"` will solve the crash.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1145