[GH-ISSUE #9911] panic: interface conversion: interface {} is *ggml.array, not uint32 #6490

Closed
opened 2026-04-12 18:03:58 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @dpk-it on GitHub (Mar 20, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9911

What is the issue?

Crashes while loading model with error "panic: interface conversion: interface {} is *ggml.array, not uint32"

ENV

  • WSL2
  • docker
  • model: hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q5_K_L

Relevant log output

2025/03/20 15:28:55 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-20T15:28:55.339Z level=INFO source=images.go:432 msg="total blobs: 178"
time=2025-03-20T15:28:55.342Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-20T15:28:55.343Z level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.6.2)"
time=2025-03-20T15:28:55.343Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-20T15:28:56.013Z level=INFO source=types.go:130 msg="inference compute" id=GPU-e8a01a94-7d0f-f68d-f1b9-6d652b29b486 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.8 GiB" available="30.1 GiB"
time=2025-03-20T15:28:56.013Z level=INFO source=types.go:130 msg="inference compute" id=GPU-f5e9aa1d-8aae-882d-3c1a-8439274b917e library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4070 Ti SUPER" total="16.0 GiB" available="14.7 GiB"
time=2025-03-20T15:28:56.013Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a3276781-03b0-19e3-9aff-f476adf829ef library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4060 Ti" total="16.0 GiB" available="14.9 GiB"
[GIN] 2025/03/20 - 15:34:45 | 200 |     100.781µs |      172.18.0.3 | GET      "/api/version"
[GIN] 2025/03/20 - 15:34:47 | 200 |      50.633µs |      172.18.0.3 | GET      "/api/version"
[GIN] 2025/03/20 - 15:35:03 | 200 |   64.828297ms |      172.18.0.3 | GET      "/api/tags"
[GIN] 2025/03/20 - 15:35:06 | 200 |    6.305021ms |      172.18.0.3 | GET      "/api/tags"
[GIN] 2025/03/20 - 15:35:30 | 200 |  3.041023547s |      172.18.0.3 | DELETE   "/api/delete"
[GIN] 2025/03/20 - 15:35:30 | 200 |     6.30305ms |      172.18.0.3 | GET      "/api/tags"
[GIN] 2025/03/20 - 15:37:57 | 200 |    5.024087ms |      172.18.0.3 | GET      "/api/tags"
time=2025-03-20T15:38:11.231Z level=INFO source=download.go:176 msg="downloading 092076b88a67 in 37 1 GB part(s)"
time=2025-03-20T15:59:30.820Z level=INFO source=download.go:176 msg="downloading b78301c0df4d in 1 38 B part(s)"
time=2025-03-20T15:59:31.920Z level=INFO source=download.go:176 msg="downloading ad4b6174552e in 1 191 B part(s)"
[GIN] 2025/03/20 - 15:59:57 | 200 |         20m1s |      172.18.0.3 | POST     "/api/pull"
[GIN] 2025/03/20 - 15:59:57 | 200 |    5.849528ms |      172.18.0.3 | GET      "/api/tags"
[GIN] 2025/03/20 - 16:00:04 | 200 |       49.06µs |      172.18.0.3 | GET      "/api/version"
time=2025-03-20T16:00:14.761Z level=WARN source=ggml.go:149 msg="key not found" key=deci.vision.block_count default=0
panic: interface conversion: interface {} is *ggml.array, not uint32

goroutine 29 [running]:
github.com/ollama/ollama/fs/ggml.keyValue[...](0xc000e80540, {0x5598eeaabd4d, 0x14}, {0xc00070fcf0, 0x1, 0x0})
        github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de
github.com/ollama/ollama/fs/ggml.KV.Uint(...)
        github.com/ollama/ollama/fs/ggml/ggml.go:96
github.com/ollama/ollama/fs/ggml.KV.HeadCount(...)
        github.com/ollama/ollama/fs/ggml/ggml.go:56
github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCount(0xc000e80540)
        github.com/ollama/ollama/fs/ggml/ggml.go:64 +0x5e
github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCountK(0xc000e80540)
        github.com/ollama/ollama/fs/ggml/ggml.go:72 +0x18
github.com/ollama/ollama/fs/ggml.GGML.SupportsFlashAttention({{0x5598eef1f0e8?, 0xc000142fa0?}, {0x5598eef1f098?, 0xc0001c3808?}})
        github.com/ollama/ollama/fs/ggml/ggml.go:648 +0x159
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...})
        github.com/ollama/ollama/llm/memory.go:133 +0x568
github.com/ollama/ollama/llm.PredictServerFit({0xc0006c3ba8?, 0x5598edc6bde5?, 0xc0006c38c0?}, 0xc000560b20, {0xc0006c3918?, _, _}, {0x0, 0x0, 0x0}, ...)
        github.com/ollama/ollama/llm/memory.go:23 +0xbd
github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc000150000, 0xc000560b20, {0xc0004fd508?, 0x3?, 0x4?}, 0xc000f27cf8)
        github.com/ollama/ollama/server/sched.go:714 +0x6f3
github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000690060, {0x5598eef23020, 0xc0001b0af0})
        github.com/ollama/ollama/server/sched.go:226 +0xe6b
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
        github.com/ollama/ollama/server/sched.go:108 +0x1f
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
        github.com/ollama/ollama/server/sched.go:107 +0xb1
2025/03/20 18:20:03 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-20T18:20:03.341Z level=INFO source=images.go:432 msg="total blobs: 176"
time=2025-03-20T18:20:03.344Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-20T18:20:03.346Z level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.6.2)"
time=2025-03-20T18:20:03.346Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-20T18:20:04.001Z level=INFO source=types.go:130 msg="inference compute" id=GPU-e8a01a94-7d0f-f68d-f1b9-6d652b29b486 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.8 GiB" available="30.1 GiB"
time=2025-03-20T18:20:04.001Z level=INFO source=types.go:130 msg="inference compute" id=GPU-f5e9aa1d-8aae-882d-3c1a-8439274b917e library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4070 Ti SUPER" total="16.0 GiB" available="14.7 GiB"
time=2025-03-20T18:20:04.001Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a3276781-03b0-19e3-9aff-f476adf829ef library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4060 Ti" total="16.0 GiB" available="14.9 GiB"
time=2025-03-20T18:20:26.207Z level=WARN source=ggml.go:149 msg="key not found" key=deci.vision.block_count default=0
panic: interface conversion: interface {} is *ggml.array, not uint32

goroutine 53 [running]:
github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0004e8600, {0x555833920d4d, 0x14}, {0xc000657890, 0x1, 0x555833cb4760})
        github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de
github.com/ollama/ollama/fs/ggml.KV.Uint(...)
        github.com/ollama/ollama/fs/ggml/ggml.go:96
github.com/ollama/ollama/fs/ggml.KV.HeadCount(...)
        github.com/ollama/ollama/fs/ggml/ggml.go:56
github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x555833d940e8?, 0xc000122be0?}, {0x555833d94098?, 0xc0001c3808?}}, 0x2000, 0x200, {0x0, 0x0})
        github.com/ollama/ollama/fs/ggml/ggml.go:418 +0x137
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...})
        github.com/ollama/ollama/llm/memory.go:140 +0x659
github.com/ollama/ollama/llm.PredictServerFit({0xc00061dba8?, 0x555832ae0de5?, 0xc00061d8c0?}, 0xc0002b6be0, {0xc00061d918?, _, _}, {0x0, 0x0, 0x0}, ...)
        github.com/ollama/ollama/llm/memory.go:23 +0xbd
github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc000134000, 0xc0002b6be0, {0xc0004cb508?, 0x3?, 0x4?}, 0xc000047cf8)
        github.com/ollama/ollama/server/sched.go:714 +0x6f3
github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000422060, {0x555833d98020, 0xc00013e140})
        github.com/ollama/ollama/server/sched.go:226 +0xe6b
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
        github.com/ollama/ollama/server/sched.go:108 +0x1f
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
        github.com/ollama/ollama/server/sched.go:107 +0xb1

OS

Docker

GPU

Nvidia

CPU

AMD

Ollama version

0.6.2

Originally created by @dpk-it on GitHub (Mar 20, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9911 ### What is the issue? Crashes while loading model with error "panic: interface conversion: interface {} is *ggml.array, not uint32" ENV - WSL2 - docker - model: hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q5_K_L ### Relevant log output ```shell 2025/03/20 15:28:55 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-20T15:28:55.339Z level=INFO source=images.go:432 msg="total blobs: 178" time=2025-03-20T15:28:55.342Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-20T15:28:55.343Z level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.6.2)" time=2025-03-20T15:28:55.343Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-20T15:28:56.013Z level=INFO source=types.go:130 msg="inference compute" id=GPU-e8a01a94-7d0f-f68d-f1b9-6d652b29b486 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.8 GiB" available="30.1 GiB" time=2025-03-20T15:28:56.013Z level=INFO source=types.go:130 msg="inference compute" id=GPU-f5e9aa1d-8aae-882d-3c1a-8439274b917e library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4070 Ti SUPER" total="16.0 GiB" available="14.7 GiB" time=2025-03-20T15:28:56.013Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a3276781-03b0-19e3-9aff-f476adf829ef library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4060 Ti" total="16.0 GiB" available="14.9 GiB" [GIN] 2025/03/20 - 15:34:45 | 200 | 100.781µs | 172.18.0.3 | GET "/api/version" [GIN] 2025/03/20 - 15:34:47 | 200 | 50.633µs | 172.18.0.3 | GET "/api/version" [GIN] 2025/03/20 - 15:35:03 | 200 | 64.828297ms | 172.18.0.3 | GET "/api/tags" [GIN] 2025/03/20 - 15:35:06 | 200 | 6.305021ms | 172.18.0.3 | GET "/api/tags" [GIN] 2025/03/20 - 15:35:30 | 200 | 3.041023547s | 172.18.0.3 | DELETE "/api/delete" [GIN] 2025/03/20 - 15:35:30 | 200 | 6.30305ms | 172.18.0.3 | GET "/api/tags" [GIN] 2025/03/20 - 15:37:57 | 200 | 5.024087ms | 172.18.0.3 | GET "/api/tags" time=2025-03-20T15:38:11.231Z level=INFO source=download.go:176 msg="downloading 092076b88a67 in 37 1 GB part(s)" time=2025-03-20T15:59:30.820Z level=INFO source=download.go:176 msg="downloading b78301c0df4d in 1 38 B part(s)" time=2025-03-20T15:59:31.920Z level=INFO source=download.go:176 msg="downloading ad4b6174552e in 1 191 B part(s)" [GIN] 2025/03/20 - 15:59:57 | 200 | 20m1s | 172.18.0.3 | POST "/api/pull" [GIN] 2025/03/20 - 15:59:57 | 200 | 5.849528ms | 172.18.0.3 | GET "/api/tags" [GIN] 2025/03/20 - 16:00:04 | 200 | 49.06µs | 172.18.0.3 | GET "/api/version" time=2025-03-20T16:00:14.761Z level=WARN source=ggml.go:149 msg="key not found" key=deci.vision.block_count default=0 panic: interface conversion: interface {} is *ggml.array, not uint32 goroutine 29 [running]: github.com/ollama/ollama/fs/ggml.keyValue[...](0xc000e80540, {0x5598eeaabd4d, 0x14}, {0xc00070fcf0, 0x1, 0x0}) github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de github.com/ollama/ollama/fs/ggml.KV.Uint(...) github.com/ollama/ollama/fs/ggml/ggml.go:96 github.com/ollama/ollama/fs/ggml.KV.HeadCount(...) github.com/ollama/ollama/fs/ggml/ggml.go:56 github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCount(0xc000e80540) github.com/ollama/ollama/fs/ggml/ggml.go:64 +0x5e github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCountK(0xc000e80540) github.com/ollama/ollama/fs/ggml/ggml.go:72 +0x18 github.com/ollama/ollama/fs/ggml.GGML.SupportsFlashAttention({{0x5598eef1f0e8?, 0xc000142fa0?}, {0x5598eef1f098?, 0xc0001c3808?}}) github.com/ollama/ollama/fs/ggml/ggml.go:648 +0x159 github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}) github.com/ollama/ollama/llm/memory.go:133 +0x568 github.com/ollama/ollama/llm.PredictServerFit({0xc0006c3ba8?, 0x5598edc6bde5?, 0xc0006c38c0?}, 0xc000560b20, {0xc0006c3918?, _, _}, {0x0, 0x0, 0x0}, ...) github.com/ollama/ollama/llm/memory.go:23 +0xbd github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc000150000, 0xc000560b20, {0xc0004fd508?, 0x3?, 0x4?}, 0xc000f27cf8) github.com/ollama/ollama/server/sched.go:714 +0x6f3 github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000690060, {0x5598eef23020, 0xc0001b0af0}) github.com/ollama/ollama/server/sched.go:226 +0xe6b github.com/ollama/ollama/server.(*Scheduler).Run.func1() github.com/ollama/ollama/server/sched.go:108 +0x1f created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 github.com/ollama/ollama/server/sched.go:107 +0xb1 ``` ``` 2025/03/20 18:20:03 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-20T18:20:03.341Z level=INFO source=images.go:432 msg="total blobs: 176" time=2025-03-20T18:20:03.344Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-20T18:20:03.346Z level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.6.2)" time=2025-03-20T18:20:03.346Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-20T18:20:04.001Z level=INFO source=types.go:130 msg="inference compute" id=GPU-e8a01a94-7d0f-f68d-f1b9-6d652b29b486 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.8 GiB" available="30.1 GiB" time=2025-03-20T18:20:04.001Z level=INFO source=types.go:130 msg="inference compute" id=GPU-f5e9aa1d-8aae-882d-3c1a-8439274b917e library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4070 Ti SUPER" total="16.0 GiB" available="14.7 GiB" time=2025-03-20T18:20:04.001Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a3276781-03b0-19e3-9aff-f476adf829ef library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4060 Ti" total="16.0 GiB" available="14.9 GiB" time=2025-03-20T18:20:26.207Z level=WARN source=ggml.go:149 msg="key not found" key=deci.vision.block_count default=0 panic: interface conversion: interface {} is *ggml.array, not uint32 goroutine 53 [running]: github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0004e8600, {0x555833920d4d, 0x14}, {0xc000657890, 0x1, 0x555833cb4760}) github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de github.com/ollama/ollama/fs/ggml.KV.Uint(...) github.com/ollama/ollama/fs/ggml/ggml.go:96 github.com/ollama/ollama/fs/ggml.KV.HeadCount(...) github.com/ollama/ollama/fs/ggml/ggml.go:56 github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x555833d940e8?, 0xc000122be0?}, {0x555833d94098?, 0xc0001c3808?}}, 0x2000, 0x200, {0x0, 0x0}) github.com/ollama/ollama/fs/ggml/ggml.go:418 +0x137 github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}) github.com/ollama/ollama/llm/memory.go:140 +0x659 github.com/ollama/ollama/llm.PredictServerFit({0xc00061dba8?, 0x555832ae0de5?, 0xc00061d8c0?}, 0xc0002b6be0, {0xc00061d918?, _, _}, {0x0, 0x0, 0x0}, ...) github.com/ollama/ollama/llm/memory.go:23 +0xbd github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc000134000, 0xc0002b6be0, {0xc0004cb508?, 0x3?, 0x4?}, 0xc000047cf8) github.com/ollama/ollama/server/sched.go:714 +0x6f3 github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000422060, {0x555833d98020, 0xc00013e140}) github.com/ollama/ollama/server/sched.go:226 +0xe6b github.com/ollama/ollama/server.(*Scheduler).Run.func1() github.com/ollama/ollama/server/sched.go:108 +0x1f created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 github.com/ollama/ollama/server/sched.go:107 +0xb1 ``` ### OS Docker ### GPU Nvidia ### CPU AMD ### Ollama version 0.6.2
GiteaMirror added the bug label 2026-04-12 18:03:58 -05:00
Author
Owner

@nickheyer commented on GitHub (Mar 21, 2025):

yep, same: interface conversion: interface {} is *ggml.array, not uint32

<!-- gh-comment-id:2741957784 --> @nickheyer commented on GitHub (Mar 21, 2025): yep, same: interface conversion: interface {} is *ggml.array, not uint32
Author
Owner

@dpk-it commented on GitHub (Mar 21, 2025):

@pdevine I think it's better to leave the bug open until the issue is fixed

<!-- gh-comment-id:2744285147 --> @dpk-it commented on GitHub (Mar 21, 2025): @pdevine I think it's better to leave the bug open until the issue is fixed
Author
Owner

@pdevine commented on GitHub (Mar 21, 2025):

Hey @dpk-it thanks for posting the issue. There were a bunch of nemotron related issues I was trying to consolidate into one bug. Having multiple issues for the same model makes the issue tracker super difficult to manage.

That said, looking through the trace it looks like it failed looking for key=deci.vision.block_count which is not something we support. I'm not sure how bartowski is converting these images.

<!-- gh-comment-id:2744585718 --> @pdevine commented on GitHub (Mar 21, 2025): Hey @dpk-it thanks for posting the issue. There were a bunch of nemotron related issues I was trying to consolidate into one bug. Having multiple issues for the same model makes the issue tracker super difficult to manage. That said, looking through the trace it looks like it failed looking for `key=deci.vision.block_count` which is not something we support. I'm not sure how bartowski is converting these images.
Author
Owner

@JeroenAdam commented on GitHub (Mar 23, 2025):

@pdevine It is not related to bartowski his conversion process.
Also these quants: https://ollama.com/MHKetbi/nvidia_Llama-3.3-Nemotron-Super-49B-v1
As it is an option for users to download, isn't it supposed to work?

<!-- gh-comment-id:2746030138 --> @JeroenAdam commented on GitHub (Mar 23, 2025): @pdevine It is not related to bartowski his conversion process. Also these quants: https://ollama.com/MHKetbi/nvidia_Llama-3.3-Nemotron-Super-49B-v1 As it is an option for users to download, isn't it supposed to work?
Author
Owner

@ZaneITRI commented on GitHub (May 28, 2025):

Yes, I also got an ollama crash and the same error message in log.
Running MHKetbi/nvidia_Llama-3.3-Nemotron-Super-49B-v1 wtth ollama 0.7.1, ollama crash and got the following error message in ollama log:
time=2025-05-28T16:13:37.689+08:00 level=DEBUG source=ggml.go:155 msg="key not found" key=deci.vision.block_count default=0
panic: interface conversion: interface {} is *ggml.array[int32], not uint32

goroutine 101 [running]:
github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0006932f0, {0x55b5996d926c, 0x14}, {0xc00073e320, 0x1, 0xc0005a55d0})
github.com/ollama/ollama/fs/ggml/ggml.go:152 +0x2e5
github.com/ollama/ollama/fs/ggml.KV.Uint(...)

<!-- gh-comment-id:2915428988 --> @ZaneITRI commented on GitHub (May 28, 2025): Yes, I also got an ollama crash and the same error message in log. Running MHKetbi/nvidia_Llama-3.3-Nemotron-Super-49B-v1 wtth ollama 0.7.1, ollama crash and got the following error message in ollama log: time=2025-05-28T16:13:37.689+08:00 level=DEBUG source=ggml.go:155 msg="key not found" key=deci.vision.block_count default=0 panic: interface conversion: interface {} is *ggml.array[int32], not uint32 goroutine 101 [running]: github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0006932f0, {0x55b5996d926c, 0x14}, {0xc00073e320, 0x1, 0xc0005a55d0}) github.com/ollama/ollama/fs/ggml/ggml.go:152 +0x2e5 github.com/ollama/ollama/fs/ggml.KV.Uint(...)
Author
Owner

@cruzanstx commented on GitHub (Jun 14, 2025):

Also having same crash.

initializing C:\WINDOWS\system32\nvcuda.dll
dlsym: cuInit - 00007FF91E1D5F80
dlsym: cuDriverGetVersion - 00007FF91E1D6020
dlsym: cuDeviceGetCount - 00007FF91E1D6816
dlsym: cuDeviceGet - 00007FF91E1D6810
dlsym: cuDeviceGetAttribute - 00007FF91E1D6170
dlsym: cuDeviceGetUuid - 00007FF91E1D6822
dlsym: cuDeviceGetName - 00007FF91E1D681C
dlsym: cuCtxCreate_v3 - 00007FF91E1D6894
dlsym: cuMemGetInfo_v2 - 00007FF91E1D6996
dlsym: cuCtxDestroy - 00007FF91E1D68A6
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-06-14T10:02:01.132-04:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=C:\WINDOWS\system32\nvcuda.dll
[GPU-1e0127fb-e471-3db2-af2d-a73a494a252f] CUDA totalMem 97886mb
[GPU-1e0127fb-e471-3db2-af2d-a73a494a252f] CUDA freeMem 96050mb
[GPU-1e0127fb-e471-3db2-af2d-a73a494a252f] Compute Capability 12.0
time=2025-06-14T10:02:01.224-04:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: The specified module could not be found."
releasing cuda driver library
releasing nvml library
time=2025-06-14T10:02:01.225-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-1e0127fb-e471-3db2-af2d-a73a494a252f library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA RTX PRO 6000 Blackwell Workstation Edition" total="95.6 GiB" available="93.8 GiB"
[GIN] 2025/06/14 - 10:02:23 | 200 |       580.2µs |       10.0.0.19 | HEAD     "/"
time=2025-06-14T10:02:24.170-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T10:02:24.187-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
[GIN] 2025/06/14 - 10:02:24 | 200 |    199.9333ms |       10.0.0.19 | POST     "/api/show"
time=2025-06-14T10:02:24.213-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T10:02:24.214-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="63.9 GiB" before.free="43.1 GiB" before.free_swap="92.7 GiB" now.total="63.9 GiB" now.free="43.5 GiB" now.free_swap="93.2 GiB"
time=2025-06-14T10:02:24.232-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1e0127fb-e471-3db2-af2d-a73a494a252f name="NVIDIA RTX PRO 6000 Blackwell Workstation Edition" overhead="0 B" before.total="95.6 GiB" before.free="93.8 GiB" now.total="95.6 GiB" now.free="92.3 GiB" now.used="3.3 GiB"
releasing nvml library
time=2025-06-14T10:02:24.233-04:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-06-14T10:02:24.250-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T10:02:24.266-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T10:02:24.267-04:00 level=DEBUG source=sched.go:228 msg="loading first model" model=C:\Users\ankar\.ollama\models\blobs\sha256-c6cb8a7ede1b00b5fff8cb512e5402aedf8191e42743d2dd29b8a705d7e3af3c
time=2025-06-14T10:02:24.267-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[92.3 GiB]"
time=2025-06-14T10:02:24.267-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=deci.vision.block_count default=0
panic: interface conversion: interface {} is *ggml.array[int32], not uint32

goroutine 54 [running]:
github.com/ollama/ollama/fs/ggml.keyValue[...](0xc00069aa50, {0x7ff7e12c2eee, 0x14}, {0xc000466d90, 0x1, 0x0})
	C:/a/ollama/ollama/fs/ggml/ggml.go:152 +0x2e5
github.com/ollama/ollama/fs/ggml.KV.Uint(...)
	C:/a/ollama/ollama/fs/ggml/ggml.go:97
github.com/ollama/ollama/fs/ggml.KV.HeadCount(...)
	C:/a/ollama/ollama/fs/ggml/ggml.go:57
github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x7ff7e147e310, 0xc0006c0b40}, {0x7ff7e147e2c0, 0xc00019d808}, 0x73777dbc0}, 0x2000, 0x200, 0x2, {0x0, 0x0})
	C:/a/ollama/ollama/fs/ggml/ggml.go:428 +0x131
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}, ...)
	C:/a/ollama/ollama/llm/memory.go:142 +0x725
github.com/ollama/ollama/llm.PredictServerFit({0xc000049b70?, 0x0?, 0x0?}, 0xc000531650, {0x0?, 0x0?, 0x0?}, {0x0, 0x0, 0x0}, ...)
	C:/a/ollama/ollama/llm/memory.go:23 +0xe5
github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc0005a3790, 0xc000531650, {0xc0000fe240?, 0x1?, 0x1?}, 0xc000049cc8)
	C:/a/ollama/ollama/server/sched.go:787 +0x6fb
github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000576540, {0x7ff7e1482540, 0xc00018cfa0})
	C:/a/ollama/ollama/server/sched.go:229 +0xf6e
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
	C:/a/ollama/ollama/server/sched.go:110 +0x1f
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
	C:/a/ollama/ollama/server/sched.go:109 +0xb1
<!-- gh-comment-id:2972792928 --> @cruzanstx commented on GitHub (Jun 14, 2025): Also having same crash. ``` initializing C:\WINDOWS\system32\nvcuda.dll dlsym: cuInit - 00007FF91E1D5F80 dlsym: cuDriverGetVersion - 00007FF91E1D6020 dlsym: cuDeviceGetCount - 00007FF91E1D6816 dlsym: cuDeviceGet - 00007FF91E1D6810 dlsym: cuDeviceGetAttribute - 00007FF91E1D6170 dlsym: cuDeviceGetUuid - 00007FF91E1D6822 dlsym: cuDeviceGetName - 00007FF91E1D681C dlsym: cuCtxCreate_v3 - 00007FF91E1D6894 dlsym: cuMemGetInfo_v2 - 00007FF91E1D6996 dlsym: cuCtxDestroy - 00007FF91E1D68A6 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-06-14T10:02:01.132-04:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=C:\WINDOWS\system32\nvcuda.dll [GPU-1e0127fb-e471-3db2-af2d-a73a494a252f] CUDA totalMem 97886mb [GPU-1e0127fb-e471-3db2-af2d-a73a494a252f] CUDA freeMem 96050mb [GPU-1e0127fb-e471-3db2-af2d-a73a494a252f] Compute Capability 12.0 time=2025-06-14T10:02:01.224-04:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: The specified module could not be found." releasing cuda driver library releasing nvml library time=2025-06-14T10:02:01.225-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-1e0127fb-e471-3db2-af2d-a73a494a252f library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA RTX PRO 6000 Blackwell Workstation Edition" total="95.6 GiB" available="93.8 GiB" [GIN] 2025/06/14 - 10:02:23 | 200 | 580.2µs | 10.0.0.19 | HEAD "/" time=2025-06-14T10:02:24.170-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T10:02:24.187-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 [GIN] 2025/06/14 - 10:02:24 | 200 | 199.9333ms | 10.0.0.19 | POST "/api/show" time=2025-06-14T10:02:24.213-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T10:02:24.214-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="63.9 GiB" before.free="43.1 GiB" before.free_swap="92.7 GiB" now.total="63.9 GiB" now.free="43.5 GiB" now.free_swap="93.2 GiB" time=2025-06-14T10:02:24.232-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-1e0127fb-e471-3db2-af2d-a73a494a252f name="NVIDIA RTX PRO 6000 Blackwell Workstation Edition" overhead="0 B" before.total="95.6 GiB" before.free="93.8 GiB" now.total="95.6 GiB" now.free="92.3 GiB" now.used="3.3 GiB" releasing nvml library time=2025-06-14T10:02:24.233-04:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-06-14T10:02:24.250-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T10:02:24.266-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T10:02:24.267-04:00 level=DEBUG source=sched.go:228 msg="loading first model" model=C:\Users\ankar\.ollama\models\blobs\sha256-c6cb8a7ede1b00b5fff8cb512e5402aedf8191e42743d2dd29b8a705d7e3af3c time=2025-06-14T10:02:24.267-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[92.3 GiB]" time=2025-06-14T10:02:24.267-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=deci.vision.block_count default=0 panic: interface conversion: interface {} is *ggml.array[int32], not uint32 goroutine 54 [running]: github.com/ollama/ollama/fs/ggml.keyValue[...](0xc00069aa50, {0x7ff7e12c2eee, 0x14}, {0xc000466d90, 0x1, 0x0}) C:/a/ollama/ollama/fs/ggml/ggml.go:152 +0x2e5 github.com/ollama/ollama/fs/ggml.KV.Uint(...) C:/a/ollama/ollama/fs/ggml/ggml.go:97 github.com/ollama/ollama/fs/ggml.KV.HeadCount(...) C:/a/ollama/ollama/fs/ggml/ggml.go:57 github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x7ff7e147e310, 0xc0006c0b40}, {0x7ff7e147e2c0, 0xc00019d808}, 0x73777dbc0}, 0x2000, 0x200, 0x2, {0x0, 0x0}) C:/a/ollama/ollama/fs/ggml/ggml.go:428 +0x131 github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}, ...) C:/a/ollama/ollama/llm/memory.go:142 +0x725 github.com/ollama/ollama/llm.PredictServerFit({0xc000049b70?, 0x0?, 0x0?}, 0xc000531650, {0x0?, 0x0?, 0x0?}, {0x0, 0x0, 0x0}, ...) C:/a/ollama/ollama/llm/memory.go:23 +0xe5 github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc0005a3790, 0xc000531650, {0xc0000fe240?, 0x1?, 0x1?}, 0xc000049cc8) C:/a/ollama/ollama/server/sched.go:787 +0x6fb github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000576540, {0x7ff7e1482540, 0xc00018cfa0}) C:/a/ollama/ollama/server/sched.go:229 +0xf6e github.com/ollama/ollama/server.(*Scheduler).Run.func1() C:/a/ollama/ollama/server/sched.go:110 +0x1f created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 C:/a/ollama/ollama/server/sched.go:109 +0xb1 ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6490