[GH-ISSUE #3636] New Gemma:7B crashes ollama server when using open-webui #28001

Closed
opened 2026-04-22 05:43:04 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @dewrama on GitHub (Apr 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3636

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Ollama works well with all models. With recent gemma:7b update, running query using open-webui crashes ollama (other models are fine). I have to restart ollama after crash. I verified old version of gemma:7b works (only with recent updated version). If there is an error, system should catch it and throw error code then continue. Crashing the entire server running an inference should not happen.

Ollama log
time=2024-04-12T21:31:40.000-07:00 level=WARN source=server.go:113 msg="server crash 12 - exit code 3221226505 - respawning"

{"function":"initialize","level":"INFO","line":444,"msg":"initializing slots","n_slots":1,"tid":"37388","timestamp":1712982626}
{"function":"initialize","level":"INFO","line":456,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"37388","timestamp":1712982626}
time=2024-04-12T21:30:26.332-07:00 level=INFO source=dyn_ext_server.go:159 msg="Starting llama main loop"
{"function":"update_slots","level":"INFO","line":1574,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"42304","timestamp":1712982626}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"42304","timestamp":1712982626}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1812,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":44,"slot_id":0,"task_id":0,"tid":"42304","timestamp":1712982626}
{"function":"update_slots","level":"INFO","line":1836,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":0,"tid":"42304","timestamp":1712982626}
CUDA error: out of memory
current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:532
cuMemSetAccess(pool_addr + pool_size, reserve_size, &access, 1)
GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"

What did you expect to see?

No response

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

Windows

Architecture

amd64

Platform

No response

Ollama version

0.1.31

GPU

Nvidia

GPU info

geforce RTX 4060 8gig vram

CPU

No response

Other software

No response

Originally created by @dewrama on GitHub (Apr 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3636 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Ollama works well with all models. With recent gemma:7b update, running query using open-webui crashes ollama (other models are fine). I have to restart ollama after crash. I verified old version of gemma:7b works (only with recent updated version). If there is an error, system should catch it and throw error code then continue. Crashing the entire server running an inference should not happen. Ollama log time=2024-04-12T21:31:40.000-07:00 level=WARN source=server.go:113 msg="server crash 12 - exit code 3221226505 - respawning" {"function":"initialize","level":"INFO","line":444,"msg":"initializing slots","n_slots":1,"tid":"37388","timestamp":1712982626} {"function":"initialize","level":"INFO","line":456,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"37388","timestamp":1712982626} time=2024-04-12T21:30:26.332-07:00 level=INFO source=dyn_ext_server.go:159 msg="Starting llama main loop" {"function":"update_slots","level":"INFO","line":1574,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"42304","timestamp":1712982626} {"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"42304","timestamp":1712982626} {"function":"update_slots","ga_i":0,"level":"INFO","line":1812,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":44,"slot_id":0,"task_id":0,"tid":"42304","timestamp":1712982626} {"function":"update_slots","level":"INFO","line":1836,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":0,"tid":"42304","timestamp":1712982626} CUDA error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:532 cuMemSetAccess(pool_addr + pool_size, reserve_size, &access, 1) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error" ### What did you expect to see? _No response_ ### Steps to reproduce _No response_ ### Are there any recent changes that introduced the issue? _No response_ ### OS Windows ### Architecture amd64 ### Platform _No response_ ### Ollama version 0.1.31 ### GPU Nvidia ### GPU info geforce RTX 4060 8gig vram ### CPU _No response_ ### Other software _No response_
GiteaMirror added the nvidiabug labels 2026-04-22 05:43:04 -05:00
Author
Owner

@ycyy commented on GitHub (Apr 18, 2024):

#3232 Update to version 0.1.32 for testing.

<!-- gh-comment-id:2063046246 --> @ycyy commented on GitHub (Apr 18, 2024): #3232 Update to version 0.1.32 for testing.
Author
Owner

@dewrama commented on GitHub (Apr 18, 2024):

Thanks. I updated yesterday and verified it is version 0.1.32. Still crashing with same issue only for gemma:7b latest.

<!-- gh-comment-id:2063161716 --> @dewrama commented on GitHub (Apr 18, 2024): Thanks. I updated yesterday and verified it is version 0.1.32. Still crashing with same issue only for gemma:7b latest.
Author
Owner

@ycyy commented on GitHub (Apr 19, 2024):

Thanks. I updated yesterday and verified it is version 0.1.32. Still crashing with same issue only for gemma:7b latest.

Perhaps you could upload the complete logs and have the developers investigate further.

<!-- gh-comment-id:2065604085 --> @ycyy commented on GitHub (Apr 19, 2024): > Thanks. I updated yesterday and verified it is version 0.1.32. Still crashing with same issue only for gemma:7b latest. Perhaps you could upload the complete logs and have the developers investigate further.
Author
Owner

@dhiltgen commented on GitHub (Jun 1, 2024):

I believe this should be resolved in the latest release.

If you're still seeing the crash, please share the following and I'll re-open.

Quit the ollama tray application, and then run the following in one powershell terminal

$env:OLLAMA_DEBUG="1"
ollama serve  2>&1 | % ToString | Tee-Object server.log

then in another terminal ollama run gemma:7b and share the server.log so we can see why it's getting the memory prediction wrong.

<!-- gh-comment-id:2143633502 --> @dhiltgen commented on GitHub (Jun 1, 2024): I believe this should be resolved in the latest release. If you're still seeing the crash, please share the following and I'll re-open. Quit the ollama tray application, and then run the following in one powershell terminal ``` $env:OLLAMA_DEBUG="1" ollama serve 2>&1 | % ToString | Tee-Object server.log ``` then in another terminal `ollama run gemma:7b` and share the server.log so we can see why it's getting the memory prediction wrong.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28001