[GH-ISSUE #668] Client only displays Unexpected EOF when error happens during /generate #62336

Closed
opened 2026-05-03 08:16:37 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @ratnadeep007 on GitHub (Oct 1, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/668

Issue:

codellama 13b run while codellama 7b fails with following error:

Error: error reading llm response: unexpected EOF

I can codellama 13b with same prompt.

I have 16GB RAM

Originally created by @ratnadeep007 on GitHub (Oct 1, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/668 ### Issue: codellama 13b run while codellama 7b fails with following error: `Error: error reading llm response: unexpected EOF` I can codellama 13b with same prompt. I have 16GB RAM
GiteaMirror added the bug label 2026-05-03 08:16:37 -05:00
Author
Owner

@mchiang0610 commented on GitHub (Oct 1, 2023):

It looks like ollama crashed. Would it be possible to get the logs from you?

Log location:
~/.ollama/logs/server.log

<!-- gh-comment-id:1742211381 --> @mchiang0610 commented on GitHub (Oct 1, 2023): It looks like ollama crashed. Would it be possible to get the logs from you? Log location: `~/.ollama/logs/server.log`
Author
Owner

@ratnadeep007 commented on GitHub (Oct 1, 2023):

Sure, but log location you provided doesn't exists.

<!-- gh-comment-id:1742213109 --> @ratnadeep007 commented on GitHub (Oct 1, 2023): Sure, but log location you provided doesn't exists.
Author
Owner

@ratnadeep007 commented on GitHub (Oct 1, 2023):

Found it on journalctl.

logs.txt

<!-- gh-comment-id:1742217780 --> @ratnadeep007 commented on GitHub (Oct 1, 2023): Found it on `journalctl`. [logs.txt](https://github.com/jmorganca/ollama/files/12778364/logs.txt)
Author
Owner

@BruceMacD commented on GitHub (Oct 2, 2023):

This was due to Ollama trying to load more layers into VRAM than was possible:

ollama[8017]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml/ggml-cuda.cu:4856: out of memory
ollama[1821]: [GIN] 2023/10/02 - 03:30:54 | 200 |  3.712295143s |       127.0.0.1 | POST     "/api/generate"
ollama[1821]: 2023/10/02 03:30:54 llama.go:320: llama runner exited with error: exit status 1

There is another open issue for refining how many layers we load into memory, there is also a workaround there in the last comment.
#618

We should be returning better error message in the client, I'll tweak this issue to reflect that.

<!-- gh-comment-id:1743148447 --> @BruceMacD commented on GitHub (Oct 2, 2023): This was due to Ollama trying to load more layers into VRAM than was possible: ``` ollama[8017]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml/ggml-cuda.cu:4856: out of memory ollama[1821]: [GIN] 2023/10/02 - 03:30:54 | 200 | 3.712295143s | 127.0.0.1 | POST "/api/generate" ollama[1821]: 2023/10/02 03:30:54 llama.go:320: llama runner exited with error: exit status 1 ``` There is another open issue for refining how many layers we load into memory, there is also a workaround there in the last comment. #618 We should be returning better error message in the client, I'll tweak this issue to reflect that.
Author
Owner

@ratnadeep007 commented on GitHub (Oct 2, 2023):

Thanks @BruceMacD for the workaround but I have one question if the VRAM is less for the 7b model how is it working for the 13b model?

<!-- gh-comment-id:1743165038 --> @ratnadeep007 commented on GitHub (Oct 2, 2023): Thanks @BruceMacD for the workaround but I have one question if the VRAM is less for the 7b model how is it working for the 13b model?
Author
Owner

@BruceMacD commented on GitHub (Oct 2, 2023):

Not 100% certain on the root cause, but there's a few possibilities. The calculation for the number of layers to load into VRAM will change between 7B and 13B, its probably a bit more aggressive for the 7B. It could also be that something else on your system was consuming more VRAM while the 7B model was running.

<!-- gh-comment-id:1743173788 --> @BruceMacD commented on GitHub (Oct 2, 2023): Not 100% certain on the root cause, but there's a few possibilities. The calculation for the number of layers to load into VRAM will change between 7B and 13B, its probably a bit more aggressive for the 7B. It could also be that something else on your system was consuming more VRAM while the 7B model was running.
Author
Owner

@mxyng commented on GitHub (Oct 2, 2023):

@ratnadeep007 what kind of GPU and how much VRAM do you have? As @BruceMacD mentioned, calculations are slightly different based on models. It's possible for 13B models, it's not trying to offload all the layers to GPU. Instead, some layers are offloaded to GPU while the remainder is processed by the CPU.

<!-- gh-comment-id:1743633634 --> @mxyng commented on GitHub (Oct 2, 2023): @ratnadeep007 what kind of GPU and how much VRAM do you have? As @BruceMacD mentioned, calculations are slightly different based on models. It's possible for 13B models, it's not trying to offload all the layers to GPU. Instead, some layers are offloaded to GPU while the remainder is processed by the CPU.
Author
Owner

@ratnadeep007 commented on GitHub (Oct 2, 2023):

@mxyng Running on RTX 2060 with 6 GB VRAM

<!-- gh-comment-id:1743661854 --> @ratnadeep007 commented on GitHub (Oct 2, 2023): @mxyng Running on RTX 2060 with 6 GB VRAM
Author
Owner

@jerzydziewierz commented on GitHub (Oct 9, 2023):

may I suggest that there could be a switch to disable GPU offloading, or at very least, to specify how much V-RAM to use?

I have RTX2060 - 12GB VRAM and I get llama.cpp/ggml/ggml-cuda.cu:4856: out of memory

<!-- gh-comment-id:1753411291 --> @jerzydziewierz commented on GitHub (Oct 9, 2023): may I suggest that there could be a switch to disable GPU offloading, or at very least, to specify how much V-RAM to use? I have RTX2060 - 12GB VRAM and I get llama.cpp/ggml/ggml-cuda.cu:4856: out of memory
Author
Owner

@BruceMacD commented on GitHub (Oct 27, 2023):

@jerzydziewierz thanks for the suggestion, you can disable GPU offloading right now by setting PARAM num_gpu 0 in the Modelfile, we will be adding this to the CLI soon also. This also allows you to configure less GPU offloading in general.

<!-- gh-comment-id:1783501267 --> @BruceMacD commented on GitHub (Oct 27, 2023): @jerzydziewierz thanks for the suggestion, you can disable GPU offloading right now by setting `PARAM num_gpu 0` in the Modelfile, we will be adding this to the CLI soon also. This also allows you to configure less GPU offloading in general.
Author
Owner

@BruceMacD commented on GitHub (Oct 27, 2023):

The root issue here was resolved in #825, let me know if anyone else sees this.

<!-- gh-comment-id:1783502097 --> @BruceMacD commented on GitHub (Oct 27, 2023): The root issue here was resolved in #825, let me know if anyone else sees this.
Author
Owner

@jferments commented on GitHub (Mar 5, 2024):

The root issue here was resolved in #825, let me know if anyone else sees this.

I am experiencing this right now. When I try to run these in terminal:

ollama run mistral
ollama run orca-mini

Or try to call them from my llama-index code.

They fail with the only message being:

Error: Post "http://127.0.0.1:11434/api/generate": EOF

There is nothing in the terminal output re: CUDA errors.

Here is output from journalctl for ollama:

Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: ggml ctx size =    0.11 MiB
Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: mem required  = 3917.98 MiB
Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: offloading 32 repeating layers to GPU
Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: offloading non-repeating layers to GPU
Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: offloaded 33/33 layers to GPU
Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: VRAM used: 0.00 MiB
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: ...................................................................................................
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: llama_new_context_with_model: n_ctx      = 2048
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: llama_new_context_with_model: freq_base  = 1000000.0
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: llama_new_context_with_model: freq_scale = 1
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: CUDA error 999 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:495: unknown error
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: current device: -1809317920
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: Lazy loading /tmp/ollama801692426/cuda/libext_server.so library
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:495: !"CUDA error"
Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: Could not attach to process.  If your uid matches the uid of the target
Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: ptrace: Inappropriate ioctl for device.
Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: No stack.
Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: The program is not being run.
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: SIGABRT: abort
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: PC=0x7fc01a899a1b m=14 sigcode=18446744073709551610
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: signal arrived during cgo execution
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: goroutine 41 [syscall]:
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: runtime.cgocall(0x9c3170, 0xc00033a608)
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00033a5e0 sp=0xc00033a5a8 pc=0x4291cb
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm._Cfunc_dynamic_shim_llama_server_init({0x7fbf94001d40, 0x7fbf70dfa410, 0x7fbf70d>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         _cgo_gotypes.go:287 +0x45 fp=0xc00033a608 sp=0xc00033a5e0 pc=0x7cf965
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.(*shimExtServer).llama_server_init.func1(0x45973b?, 0x80?, 0x80?)
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/llm/shim_ext_server.go:40 +0xec fp=0xc00033a6f8 sp=0xc00033a608 pc=0>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.(*shimExtServer).llama_server_init(0xc00010a2d0?, 0x0?, 0x43a2e8?)
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/llm/shim_ext_server.go:40 +0x13 fp=0xc00033a720 sp=0xc00033a6f8 pc=0>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.newExtServer({0x17845038, 0xc0004327e0}, {0xc000190af0, _}, {_, _, _}, {0x0, 0x0>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/llm/ext_server_common.go:139 +0x70e fp=0xc00033a8e0 sp=0xc00033a720 >
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.newDynamicShimExtServer({0xc0000be000, 0x2a}, {0xc000190af0, _}, {_, _, _}, {0x0>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/llm/shim_ext_server.go:93 +0x547 fp=0xc00033aaf8 sp=0xc00033a8e0 pc=>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.newLlmServer({0xc3fc44, 0x4}, {0xc000190af0, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/llm/llm.go:125 +0x149 fp=0xc00033ac78 sp=0xc00033aaf8 pc=0x7ceac9
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.New({0xc00048e240?, 0x0?}, {0xc000190af0, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/llm/llm.go:115 +0x628 fp=0xc00033aef0 sp=0xc00033ac78 pc=0x7ce608
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/server.load(0xc000002f00?, 0xc000002f00, {{0x0, 0x800, 0x200, 0x1, 0xfffffffffffffff>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/server/routes.go:84 +0x425 fp=0xc00033b0a0 sp=0xc00033aef0 pc=0x99ef>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/server.GenerateHandler(0xc000466600)
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/server/routes.go:191 +0x8c8 fp=0xc00033b748 sp=0xc00033b0a0 pc=0x99f>
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/gin-gonic/gin.(*Context).Next(...)
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc000466600)
Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]:         /go/src/github.com/jmorganca/ollama/server/routes.go:877 +0x68 fp=0xc00033b780 sp=0xc00033b748 pc=0x9a91>

You can see that CUDA error is occuring due to llama.cpp

<!-- gh-comment-id:1979467736 --> @jferments commented on GitHub (Mar 5, 2024): > The root issue here was resolved in #825, let me know if anyone else sees this. I am experiencing this right now. When I try to run these in terminal: `ollama run mistral` `ollama run orca-mini` Or try to call them from my llama-index code. They fail with the only message being: `Error: Post "http://127.0.0.1:11434/api/generate": EOF` There is nothing in the terminal output re: CUDA errors. Here is output from `journalctl` for ollama: ``` Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: ggml ctx size = 0.11 MiB Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: mem required = 3917.98 MiB Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: offloading 32 repeating layers to GPU Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: offloading non-repeating layers to GPU Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: offloaded 33/33 layers to GPU Mar 05 11:00:25 jesse-MS-7C02 ollama[74384]: llm_load_tensors: VRAM used: 0.00 MiB Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: ................................................................................................... Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: llama_new_context_with_model: n_ctx = 2048 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: llama_new_context_with_model: freq_base = 1000000.0 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: llama_new_context_with_model: freq_scale = 1 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: CUDA error 999 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:495: unknown error Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: current device: -1809317920 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: Lazy loading /tmp/ollama801692426/cuda/libext_server.so library Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: GGML_ASSERT: /go/src/github.com/jmorganca/ollama/llm/llama.cpp/ggml-cuda.cu:495: !"CUDA error" Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: Could not attach to process. If your uid matches the uid of the target Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: ptrace: Inappropriate ioctl for device. Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: No stack. Mar 05 11:00:26 jesse-MS-7C02 ollama[74962]: The program is not being run. Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: SIGABRT: abort Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: PC=0x7fc01a899a1b m=14 sigcode=18446744073709551610 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: signal arrived during cgo execution Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: goroutine 41 [syscall]: Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: runtime.cgocall(0x9c3170, 0xc00033a608) Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00033a5e0 sp=0xc00033a5a8 pc=0x4291cb Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm._Cfunc_dynamic_shim_llama_server_init({0x7fbf94001d40, 0x7fbf70dfa410, 0x7fbf70d> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: _cgo_gotypes.go:287 +0x45 fp=0xc00033a608 sp=0xc00033a5e0 pc=0x7cf965 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.(*shimExtServer).llama_server_init.func1(0x45973b?, 0x80?, 0x80?) Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/llm/shim_ext_server.go:40 +0xec fp=0xc00033a6f8 sp=0xc00033a608 pc=0> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.(*shimExtServer).llama_server_init(0xc00010a2d0?, 0x0?, 0x43a2e8?) Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/llm/shim_ext_server.go:40 +0x13 fp=0xc00033a720 sp=0xc00033a6f8 pc=0> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.newExtServer({0x17845038, 0xc0004327e0}, {0xc000190af0, _}, {_, _, _}, {0x0, 0x0> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/llm/ext_server_common.go:139 +0x70e fp=0xc00033a8e0 sp=0xc00033a720 > Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.newDynamicShimExtServer({0xc0000be000, 0x2a}, {0xc000190af0, _}, {_, _, _}, {0x0> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/llm/shim_ext_server.go:93 +0x547 fp=0xc00033aaf8 sp=0xc00033a8e0 pc=> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.newLlmServer({0xc3fc44, 0x4}, {0xc000190af0, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/llm/llm.go:125 +0x149 fp=0xc00033ac78 sp=0xc00033aaf8 pc=0x7ceac9 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/llm.New({0xc00048e240?, 0x0?}, {0xc000190af0, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...) Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/llm/llm.go:115 +0x628 fp=0xc00033aef0 sp=0xc00033ac78 pc=0x7ce608 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/server.load(0xc000002f00?, 0xc000002f00, {{0x0, 0x800, 0x200, 0x1, 0xfffffffffffffff> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/server/routes.go:84 +0x425 fp=0xc00033b0a0 sp=0xc00033aef0 pc=0x99ef> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/server.GenerateHandler(0xc000466600) Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/server/routes.go:191 +0x8c8 fp=0xc00033b748 sp=0xc00033b0a0 pc=0x99f> Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/gin-gonic/gin.(*Context).Next(...) Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc000466600) Mar 05 11:00:26 jesse-MS-7C02 ollama[74384]: /go/src/github.com/jmorganca/ollama/server/routes.go:877 +0x68 fp=0xc00033b780 sp=0xc00033b748 pc=0x9a91> ``` You can see that CUDA error is occuring due to llama.cpp
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62336