[GH-ISSUE #7802] minimum viable GGUF crashes server on run #4989

Open
opened 2026-04-12 16:03:06 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @bmizerany on GitHub (Nov 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7802

What is the issue?

I ran ollama run bmizerany/smol, and saw the server crash violently.

I expected ollama to tell me, from the terminal session running ollama run, it could not run the model for <reasons>, and for the server to remain running an unaffected.

# Client
; ollama run bmizerany/smol
# Server

[GIN] 2024/11/22 - 11:32:09 | 200 |     305.583µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/11/22 - 11:32:09 | 200 |    2.020416ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/11/22 - 11:32:09 | 200 |     550.916µs |       127.0.0.1 | POST     "/api/show"
time=2024-11-22T11:32:09.588-08:00 level=WARN source=memory.go:115 msg="model missing blk.0 layer size"
panic: runtime error: integer divide by zero

goroutine 27 [running]:
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _},
{{0x0, 0x800, 0x200, ...}, ...})
	github.com/ollama/ollama/llm/memory.go:122 +0x13f0
github.com/ollama/ollama/llm.PredictServerFit({0x14000495cb8?, 0x104ae52b4?, 0x1400001a090?}, 0x1400059a920, {0x199?, 0x105681bc0?, _}, {_, _, _}, ...)
	github.com/ollama/ollama/llm/memory.go:20 +0xa8
github.com/ollama/ollama/server.pickBestFitGPUs(0x140001d0900, 0x1400059a920, {0x140004aa780?, 0xfffffffffffffffc?, 0x105286653?})
	github.com/ollama/ollama/server/sched.go:627 +0x2a0
github.com/ollama/ollama/server.(*Scheduler).processPending(0x140000c39e0, {0x10575b8d0, 0x140000c5ea0})
	github.com/ollama/ollama/server/sched.go:170 +0xac0
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
	github.com/ollama/ollama/server/sched.go:96 +0x28
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
	github.com/ollama/ollama/server/sched.go:95 +0xc4
2024/11/22 11:32:10 routes.go:1060: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/Users/bmizerany/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR:]"
time=2024-11-22T11:32:10.640-08:00 level=INFO source=images.go:725 msg="total blobs: 6"
time=2024-11-22T11:32:10.641-08:00 level=INFO source=images.go:732 msg="total unused blobs removed: 0"
time=2024-11-22T11:32:10.642-08:00 level=INFO source=routes.go:1106 msg="Listening on 127.0.0.1:11434 (version 0.1.45)"
time=2024-11-22T11:32:10.652-08:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/var/folders/db/svmm3t1x3yn4d1skpbq3ddv00000gn/T/ollama2998818457 error="remove /var/folders/db/svmm3t1x3yn4d1skpbq3ddv00000gn/T/ollama2998818457: directory not empty"
time=2024-11-22T11:32:10.652-08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/var/folders/db/svmm3t1x3yn4d1skpbq3ddv00000gn/T/ollama827611131/runners
time=2024-11-22T11:32:10.679-08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [metal]"
time=2024-11-22T11:32:10.740-08:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=metal compute="" driver=0.0 name="" total="96.0 GiB" available="96.0 GiB"

The GGUF (xxd):

00000000: 4747 5546 0300 0000 0000 0000 0000 0000  GGUF............
00000010: 0000 0000 0000 0000                      ........

OS

Darwin MacBook-Pro-3.attlocal.net 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:37 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6031 arm64

GPU

local

CPU

see above

Ollama version

ollama version is 0.4.3

Originally created by @bmizerany on GitHub (Nov 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7802 ### What is the issue? I ran `ollama run bmizerany/smol`, and saw the server crash violently. I expected ollama to tell me, from the terminal session running `ollama run`, it could not run the model for `<reasons>`, and for the server to remain running an unaffected. ``` # Client ; ollama run bmizerany/smol ``` ``` # Server [GIN] 2024/11/22 - 11:32:09 | 200 | 305.583µs | 127.0.0.1 | HEAD "/" [GIN] 2024/11/22 - 11:32:09 | 200 | 2.020416ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/11/22 - 11:32:09 | 200 | 550.916µs | 127.0.0.1 | POST "/api/show" time=2024-11-22T11:32:09.588-08:00 level=WARN source=memory.go:115 msg="model missing blk.0 layer size" panic: runtime error: integer divide by zero goroutine 27 [running]: github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x0, 0x800, 0x200, ...}, ...}) github.com/ollama/ollama/llm/memory.go:122 +0x13f0 github.com/ollama/ollama/llm.PredictServerFit({0x14000495cb8?, 0x104ae52b4?, 0x1400001a090?}, 0x1400059a920, {0x199?, 0x105681bc0?, _}, {_, _, _}, ...) github.com/ollama/ollama/llm/memory.go:20 +0xa8 github.com/ollama/ollama/server.pickBestFitGPUs(0x140001d0900, 0x1400059a920, {0x140004aa780?, 0xfffffffffffffffc?, 0x105286653?}) github.com/ollama/ollama/server/sched.go:627 +0x2a0 github.com/ollama/ollama/server.(*Scheduler).processPending(0x140000c39e0, {0x10575b8d0, 0x140000c5ea0}) github.com/ollama/ollama/server/sched.go:170 +0xac0 github.com/ollama/ollama/server.(*Scheduler).Run.func1() github.com/ollama/ollama/server/sched.go:96 +0x28 created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 github.com/ollama/ollama/server/sched.go:95 +0xc4 2024/11/22 11:32:10 routes.go:1060: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/Users/bmizerany/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR:]" time=2024-11-22T11:32:10.640-08:00 level=INFO source=images.go:725 msg="total blobs: 6" time=2024-11-22T11:32:10.641-08:00 level=INFO source=images.go:732 msg="total unused blobs removed: 0" time=2024-11-22T11:32:10.642-08:00 level=INFO source=routes.go:1106 msg="Listening on 127.0.0.1:11434 (version 0.1.45)" time=2024-11-22T11:32:10.652-08:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/var/folders/db/svmm3t1x3yn4d1skpbq3ddv00000gn/T/ollama2998818457 error="remove /var/folders/db/svmm3t1x3yn4d1skpbq3ddv00000gn/T/ollama2998818457: directory not empty" time=2024-11-22T11:32:10.652-08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/var/folders/db/svmm3t1x3yn4d1skpbq3ddv00000gn/T/ollama827611131/runners time=2024-11-22T11:32:10.679-08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [metal]" time=2024-11-22T11:32:10.740-08:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=metal compute="" driver=0.0 name="" total="96.0 GiB" available="96.0 GiB" ``` The GGUF (xxd): ``` 00000000: 4747 5546 0300 0000 0000 0000 0000 0000 GGUF............ 00000010: 0000 0000 0000 0000 ........ ``` ### OS Darwin MacBook-Pro-3.attlocal.net 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:37 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6031 arm64 ### GPU local ### CPU see above ### Ollama version ollama version is 0.4.3
GiteaMirror added the bug label 2026-04-12 16:03:06 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4989