[GH-ISSUE #8460] Llama-3_1-Nemotron-51B-Instruct #67497

Closed
opened 2026-05-04 10:33:06 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @Tanote650 on GitHub (Jan 16, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8460

Please add the Nvidia Model.

https://huggingface.co/bartowski/Llama-3_1-Nemotron-51B-Instruct-GGUF

Originally created by @Tanote650 on GitHub (Jan 16, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8460 Please add the Nvidia Model. https://huggingface.co/bartowski/Llama-3_1-Nemotron-51B-Instruct-GGUF
GiteaMirror added the model label 2026-05-04 10:33:06 -05:00
Author
Owner

@Tanote650 commented on GitHub (Jan 25, 2025):

Thank you very much

<!-- gh-comment-id:2613856271 --> @Tanote650 commented on GitHub (Jan 25, 2025): Thank you very much
Author
Owner

@dpk-it commented on GitHub (Mar 21, 2025):

Tested 'hf.co/bartowski/Llama-3_1-Nemotron-51B-Instruct-GGUF:Q4_K_L' and 'hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q4_K_M', does not work in version 0.6.2 as stated in panic: interface conversion: interface {} is *ggml.array, not uint32 #9911

2025/03/21 19:25:27 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-21T19:25:27.235Z level=INFO source=images.go:432 msg="total blobs: 179"
time=2025-03-21T19:25:27.238Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-21T19:25:27.239Z level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.6.2)"
time=2025-03-21T19:25:27.240Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-21T19:25:28.071Z level=INFO source=types.go:130 msg="inference compute" id=GPU-e8a01a94-7d0f-f68d-f1b9-6d652b29b486 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.8 GiB" available="30.1 GiB"
time=2025-03-21T19:25:28.071Z level=INFO source=types.go:130 msg="inference compute" id=GPU-f5e9aa1d-8aae-882d-3c1a-8439274b917e library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4070 Ti SUPER" total="16.0 GiB" available="14.7 GiB"
time=2025-03-21T19:25:28.071Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a3276781-03b0-19e3-9aff-f476adf829ef library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4060 Ti" total="16.0 GiB" available="14.9 GiB"
time=2025-03-21T19:25:42.566Z level=WARN source=ggml.go:149 msg="key not found" key=deci.vision.block_count default=0
panic: interface conversion: interface {} is *ggml.array, not uint32

goroutine 82 [running]:
github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0003c7e60, {0x556f679e0d4d, 0x14}, {0xc0003b02b0, 0x1, 0x0})
        github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de
github.com/ollama/ollama/fs/ggml.KV.Uint(...)
        github.com/ollama/ollama/fs/ggml/ggml.go:96
github.com/ollama/ollama/fs/ggml.KV.HeadCount(...)
        github.com/ollama/ollama/fs/ggml/ggml.go:56
github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCount(0xc0003c7e60)
        github.com/ollama/ollama/fs/ggml/ggml.go:64 +0x5e
github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCountK(0xc0003c7e60)
        github.com/ollama/ollama/fs/ggml/ggml.go:72 +0x18
github.com/ollama/ollama/fs/ggml.GGML.SupportsFlashAttention({{0x556f67e540e8?, 0xc0005b74a0?}, {0x556f67e54098?, 0xc0004e9008?}})
        github.com/ollama/ollama/fs/ggml/ggml.go:648 +0x159
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...})
        github.com/ollama/ollama/llm/memory.go:133 +0x568
github.com/ollama/ollama/llm.PredictServerFit({0xc000399ba8?, 0x556f66ba0de5?, 0xc0003998c0?}, 0xc0002b2ee0, {0xc000399918?, _, _}, {0x0, 0x0, 0x0}, ...)
        github.com/ollama/ollama/llm/memory.go:23 +0xbd
github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc0006d64b0, 0xc0002b2ee0, {0xc0002b7808?, 0x3?, 0x4?}, 0xc0003a3cf8)
        github.com/ollama/ollama/server/sched.go:714 +0x6f3
github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000590180, {0x556f67e58020, 0xc0001299a0})
        github.com/ollama/ollama/server/sched.go:226 +0xe6b
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
        github.com/ollama/ollama/server/sched.go:108 +0x1f
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
        github.com/ollama/ollama/server/sched.go:107 +0xb1
<!-- gh-comment-id:2744270756 --> @dpk-it commented on GitHub (Mar 21, 2025): Tested 'hf.co/bartowski/Llama-3_1-Nemotron-51B-Instruct-GGUF:Q4_K_L' and 'hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q4_K_M', does not work in version 0.6.2 as stated in [panic: interface conversion: interface {} is *ggml.array, not uint32 #9911](https://github.com/ollama/ollama/issues/9911) ``` 2025/03/21 19:25:27 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-21T19:25:27.235Z level=INFO source=images.go:432 msg="total blobs: 179" time=2025-03-21T19:25:27.238Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-21T19:25:27.239Z level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.6.2)" time=2025-03-21T19:25:27.240Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-21T19:25:28.071Z level=INFO source=types.go:130 msg="inference compute" id=GPU-e8a01a94-7d0f-f68d-f1b9-6d652b29b486 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.8 GiB" available="30.1 GiB" time=2025-03-21T19:25:28.071Z level=INFO source=types.go:130 msg="inference compute" id=GPU-f5e9aa1d-8aae-882d-3c1a-8439274b917e library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4070 Ti SUPER" total="16.0 GiB" available="14.7 GiB" time=2025-03-21T19:25:28.071Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a3276781-03b0-19e3-9aff-f476adf829ef library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4060 Ti" total="16.0 GiB" available="14.9 GiB" time=2025-03-21T19:25:42.566Z level=WARN source=ggml.go:149 msg="key not found" key=deci.vision.block_count default=0 panic: interface conversion: interface {} is *ggml.array, not uint32 goroutine 82 [running]: github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0003c7e60, {0x556f679e0d4d, 0x14}, {0xc0003b02b0, 0x1, 0x0}) github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de github.com/ollama/ollama/fs/ggml.KV.Uint(...) github.com/ollama/ollama/fs/ggml/ggml.go:96 github.com/ollama/ollama/fs/ggml.KV.HeadCount(...) github.com/ollama/ollama/fs/ggml/ggml.go:56 github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCount(0xc0003c7e60) github.com/ollama/ollama/fs/ggml/ggml.go:64 +0x5e github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCountK(0xc0003c7e60) github.com/ollama/ollama/fs/ggml/ggml.go:72 +0x18 github.com/ollama/ollama/fs/ggml.GGML.SupportsFlashAttention({{0x556f67e540e8?, 0xc0005b74a0?}, {0x556f67e54098?, 0xc0004e9008?}}) github.com/ollama/ollama/fs/ggml/ggml.go:648 +0x159 github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}) github.com/ollama/ollama/llm/memory.go:133 +0x568 github.com/ollama/ollama/llm.PredictServerFit({0xc000399ba8?, 0x556f66ba0de5?, 0xc0003998c0?}, 0xc0002b2ee0, {0xc000399918?, _, _}, {0x0, 0x0, 0x0}, ...) github.com/ollama/ollama/llm/memory.go:23 +0xbd github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc0006d64b0, 0xc0002b2ee0, {0xc0002b7808?, 0x3?, 0x4?}, 0xc0003a3cf8) github.com/ollama/ollama/server/sched.go:714 +0x6f3 github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000590180, {0x556f67e58020, 0xc0001299a0}) github.com/ollama/ollama/server/sched.go:226 +0xe6b github.com/ollama/ollama/server.(*Scheduler).Run.func1() github.com/ollama/ollama/server/sched.go:108 +0x1f created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 github.com/ollama/ollama/server/sched.go:107 +0xb1 ```
Author
Owner

@JeroenAdam commented on GitHub (Mar 23, 2025):

Same issue with the newer Nemotron: https://ollama.com/MHKetbi/nvidia_Llama-3.3-Nemotron-Super-49B-v1

panic: interface conversion: interface {} is *ggml.array, not uint32

goroutine 16 [running]:
github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0002f2ff0, {0x55e8beea7d4d, 0x14}, {0xc0005d6b98, 0x1, 0x0})
github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de
github.com/ollama/ollama/fs/ggml.KV.Uint(...)
github.com/ollama/ollama/fs/ggml/ggml.go:96
github.com/ollama/ollama/fs/ggml.KV.HeadCount(...)
github.com/ollama/ollama/fs/ggml/ggml.go:56
github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCount(0xc0002f2ff0)
github.com/ollama/ollama/fs/ggml/ggml.go:64 +0x5e
github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCountK(0xc0002f2ff0)
github.com/ollama/ollama/fs/ggml/ggml.go:72 +0x18
github.com/ollama/ollama/fs/ggml.GGML.SupportsFlashAttention({{0x55e8bf31b0e8?, 0xc0006f0f50?}, {0x55e8bf31b098?, 0xc000489008?}})
github.com/ollama/ollama/fs/ggml/ggml.go:648 +0x159
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, , {, _, _}, {{0x20000, 0x200, 0xffffffffffffffff, ...}, ...})
github.com/ollama/ollama/llm/memory.go:133 +0x568
github.com/ollama/ollama/llm.PredictServerFit({0xc000785ba8?, 0x55e8be067de5?, 0xc0007858c0?}, 0xc00015a020, {0xc000785918?, _, _}, {0x0, 0x0, 0x0}, ...)
github.com/ollama/ollama/llm/memory.go:23 +0xbd
github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc00004c0f0, 0xc00015a020, {0xc0001d0240?, 0x1?, 0x1?}, 0xc000045cf8)
github.com/ollama/ollama/server/sched.go:714 +0x6f3
github.com/ollama/ollama/server.(*Scheduler).processPending(0xc0001a91a0, {0x55e8bf31f020, 0xc0003b2550})
github.com/ollama/ollama/server/sched.go:226 +0xe6b
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
github.com/ollama/ollama/server/sched.go:108 +0x1f
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
github.com/ollama/ollama/server/sched.go:107 +0xb1

<!-- gh-comment-id:2746030556 --> @JeroenAdam commented on GitHub (Mar 23, 2025): Same issue with the newer Nemotron: https://ollama.com/MHKetbi/nvidia_Llama-3.3-Nemotron-Super-49B-v1 panic: interface conversion: interface {} is *ggml.array, not uint32 goroutine 16 [running]: github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0002f2ff0, {0x55e8beea7d4d, 0x14}, {0xc0005d6b98, 0x1, 0x0}) github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de github.com/ollama/ollama/fs/ggml.KV.Uint(...) github.com/ollama/ollama/fs/ggml/ggml.go:96 github.com/ollama/ollama/fs/ggml.KV.HeadCount(...) github.com/ollama/ollama/fs/ggml/ggml.go:56 github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCount(0xc0002f2ff0) github.com/ollama/ollama/fs/ggml/ggml.go:64 +0x5e github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCountK(0xc0002f2ff0) github.com/ollama/ollama/fs/ggml/ggml.go:72 +0x18 github.com/ollama/ollama/fs/ggml.GGML.SupportsFlashAttention({{0x55e8bf31b0e8?, 0xc0006f0f50?}, {0x55e8bf31b098?, 0xc000489008?}}) github.com/ollama/ollama/fs/ggml/ggml.go:648 +0x159 github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x20000, 0x200, 0xffffffffffffffff, ...}, ...}) github.com/ollama/ollama/llm/memory.go:133 +0x568 github.com/ollama/ollama/llm.PredictServerFit({0xc000785ba8?, 0x55e8be067de5?, 0xc0007858c0?}, 0xc00015a020, {0xc000785918?, _, _}, {0x0, 0x0, 0x0}, ...) github.com/ollama/ollama/llm/memory.go:23 +0xbd github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc00004c0f0, 0xc00015a020, {0xc0001d0240?, 0x1?, 0x1?}, 0xc000045cf8) github.com/ollama/ollama/server/sched.go:714 +0x6f3 github.com/ollama/ollama/server.(*Scheduler).processPending(0xc0001a91a0, {0x55e8bf31f020, 0xc0003b2550}) github.com/ollama/ollama/server/sched.go:226 +0xe6b github.com/ollama/ollama/server.(*Scheduler).Run.func1() github.com/ollama/ollama/server/sched.go:108 +0x1f created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 github.com/ollama/ollama/server/sched.go:107 +0xb1
Author
Owner

@yt-koike commented on GitHub (Jul 8, 2025):

Same issue with https://huggingface.co/yt-koike/plamo-2-1b-gguf as well.

ollama  | [GIN] 2025/07/08 - 08:28:29 | 200 |      47.649µs |       127.0.0.1 | HEAD     "/"
ollama  | time=2025-07-08T08:28:29.309Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama  | time=2025-07-08T08:28:29.320Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama  | [GIN] 2025/07/08 - 08:28:29 | 200 |   24.060833ms |       127.0.0.1 | POST     "/api/show"
ollama  | time=2025-07-08T08:28:29.334Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama  | time=2025-07-08T08:28:29.509Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama  | time=2025-07-08T08:28:29.526Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
ollama  | time=2025-07-08T08:28:29.526Z level=WARN source=ggml.go:152 msg="key not found" key=plamo2.vision.block_count default=0
ollama  | panic: interface conversion: interface {} is *ggml.array[int32], not uint32
ollama  |
ollama  | goroutine 52 [running]:
ollama  | github.com/ollama/ollama/fs/ggml.keyValue[...](0xc000565b30, {0x5836071a4ced, 0x17}, {0xc000439a30, 0x2, 0x0})
ollama  |       github.com/ollama/ollama/fs/ggml/ggml.go:149 +0x2de
ollama  | github.com/ollama/ollama/fs/ggml.KV.Uint(...)
ollama  |       github.com/ollama/ollama/fs/ggml/ggml.go:96
ollama  | github.com/ollama/ollama/fs/ggml.KV.HeadCountKV(...)
ollama  |       github.com/ollama/ollama/fs/ggml/ggml.go:60
ollama  | github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x583607628b68?, 0xc00052c730?}, {0x583607628b18?, 0xc00018d808?}}, 0x2000, 0x200, 0x2, {0x0, 0x0})
ollama  |       github.com/ollama/ollama/fs/ggml/ggml.go:417 +0x1b8
ollama  | github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}, ...)
ollama  |       github.com/ollama/ollama/llm/memory.go:140 +0x665
ollama  | github.com/ollama/ollama/llm.PredictServerFit({0xc000305b88?, 0x0?, 0x0?}, 0xc000482100, {0x58360667e8ed?, _, _}, {0x0, 0x0, 0x0}, ...)
ollama  |       github.com/ollama/ollama/llm/memory.go:23 +0xe5
ollama  | github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc0004b00f0, 0xc000482100, {0xc0000d8240?, 0x1?, 0x1?}, 0xc0004adcd8)
ollama  |       github.com/ollama/ollama/server/sched.go:753 +0x6fb
ollama  | github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000111380, {0x58360762cb20, 0xc0006a71d0})
ollama  |       github.com/ollama/ollama/server/sched.go:228 +0xf0b
ollama  | github.com/ollama/ollama/server.(*Scheduler).Run.func1()
ollama  |       github.com/ollama/ollama/server/sched.go:109 +0x1f
ollama  | created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
ollama  |       github.com/ollama/ollama/server/sched.go:108 +0xb1
ollama  | 2025/07/08 08:28:29 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
ollama  | time=2025-07-08T08:28:29.970Z level=INFO source=images.go:463 msg="total blobs: 68"
ollama  | time=2025-07-08T08:28:29.971Z level=INFO source=images.go:470 msg="total unused blobs removed: 0"
ollama  | time=2025-07-08T08:28:29.972Z level=INFO source=routes.go:1300 msg="Listening on [::]:11434 (version 0.6.8)"
ollama  | time=2025-07-08T08:28:29.972Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
ollama  | time=2025-07-08T08:28:30.140Z level=INFO source=types.go:130 msg="inference compute" id=GPU-743bd432-c954-e10f-b04d-db1b4f5ed5ea library=cuda variant=v12 compute=8.9 driver=12.7 name="NVIDIA GeForce RTX 4070 Ti SUPER" total="15.6 GiB" available="9.4 GiB"
<!-- gh-comment-id:3047891279 --> @yt-koike commented on GitHub (Jul 8, 2025): Same issue with https://huggingface.co/yt-koike/plamo-2-1b-gguf as well. ``` ollama | [GIN] 2025/07/08 - 08:28:29 | 200 | 47.649µs | 127.0.0.1 | HEAD "/" ollama | time=2025-07-08T08:28:29.309Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-07-08T08:28:29.320Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | [GIN] 2025/07/08 - 08:28:29 | 200 | 24.060833ms | 127.0.0.1 | POST "/api/show" ollama | time=2025-07-08T08:28:29.334Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-07-08T08:28:29.509Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-07-08T08:28:29.526Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 ollama | time=2025-07-08T08:28:29.526Z level=WARN source=ggml.go:152 msg="key not found" key=plamo2.vision.block_count default=0 ollama | panic: interface conversion: interface {} is *ggml.array[int32], not uint32 ollama | ollama | goroutine 52 [running]: ollama | github.com/ollama/ollama/fs/ggml.keyValue[...](0xc000565b30, {0x5836071a4ced, 0x17}, {0xc000439a30, 0x2, 0x0}) ollama | github.com/ollama/ollama/fs/ggml/ggml.go:149 +0x2de ollama | github.com/ollama/ollama/fs/ggml.KV.Uint(...) ollama | github.com/ollama/ollama/fs/ggml/ggml.go:96 ollama | github.com/ollama/ollama/fs/ggml.KV.HeadCountKV(...) ollama | github.com/ollama/ollama/fs/ggml/ggml.go:60 ollama | github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x583607628b68?, 0xc00052c730?}, {0x583607628b18?, 0xc00018d808?}}, 0x2000, 0x200, 0x2, {0x0, 0x0}) ollama | github.com/ollama/ollama/fs/ggml/ggml.go:417 +0x1b8 ollama | github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x2000, 0x200, 0xffffffffffffffff, ...}, ...}, ...) ollama | github.com/ollama/ollama/llm/memory.go:140 +0x665 ollama | github.com/ollama/ollama/llm.PredictServerFit({0xc000305b88?, 0x0?, 0x0?}, 0xc000482100, {0x58360667e8ed?, _, _}, {0x0, 0x0, 0x0}, ...) ollama | github.com/ollama/ollama/llm/memory.go:23 +0xe5 ollama | github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc0004b00f0, 0xc000482100, {0xc0000d8240?, 0x1?, 0x1?}, 0xc0004adcd8) ollama | github.com/ollama/ollama/server/sched.go:753 +0x6fb ollama | github.com/ollama/ollama/server.(*Scheduler).processPending(0xc000111380, {0x58360762cb20, 0xc0006a71d0}) ollama | github.com/ollama/ollama/server/sched.go:228 +0xf0b ollama | github.com/ollama/ollama/server.(*Scheduler).Run.func1() ollama | github.com/ollama/ollama/server/sched.go:109 +0x1f ollama | created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 ollama | github.com/ollama/ollama/server/sched.go:108 +0xb1 ollama | 2025/07/08 08:28:29 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" ollama | time=2025-07-08T08:28:29.970Z level=INFO source=images.go:463 msg="total blobs: 68" ollama | time=2025-07-08T08:28:29.971Z level=INFO source=images.go:470 msg="total unused blobs removed: 0" ollama | time=2025-07-08T08:28:29.972Z level=INFO source=routes.go:1300 msg="Listening on [::]:11434 (version 0.6.8)" ollama | time=2025-07-08T08:28:29.972Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" ollama | time=2025-07-08T08:28:30.140Z level=INFO source=types.go:130 msg="inference compute" id=GPU-743bd432-c954-e10f-b04d-db1b4f5ed5ea library=cuda variant=v12 compute=8.9 driver=12.7 name="NVIDIA GeForce RTX 4070 Ti SUPER" total="15.6 GiB" available="9.4 GiB" ```
Author
Owner

@yt-koike commented on GitHub (Jul 8, 2025):

Never mind. After upgrading my ollama to 0.9.5, it says error loading model: error loading model architecture: unknown model architecture: 'plamo2'. This could be fixed with https://github.com/ggml-org/llama.cpp/pull/14560.

<!-- gh-comment-id:3048096894 --> @yt-koike commented on GitHub (Jul 8, 2025): Never mind. After upgrading my ollama to 0.9.5, it says `error loading model: error loading model architecture: unknown model architecture: 'plamo2'`. This could be fixed with https://github.com/ggml-org/llama.cpp/pull/14560.
Author
Owner

@mitmul commented on GitHub (Oct 24, 2025):

@yt-koike While I understand this is essentially about different error issues, I'll mention it anyway: if you're trying to run a plamo2 model in Ollama, applying just this one line of changes from this PR: https://github.com/ollama/ollama/pull/12761 might make it work.

<!-- gh-comment-id:3443036919 --> @mitmul commented on GitHub (Oct 24, 2025): @yt-koike While I understand this is essentially about different error issues, I'll mention it anyway: if you're trying to run a plamo2 model in Ollama, applying just this one line of changes from this PR: https://github.com/ollama/ollama/pull/12761 might make it work.
Author
Owner

@rick-github commented on GitHub (Jan 5, 2026):

$ ollama -v
ollama version is 0.13.5

$ ollama run hf.co/bartowski/Llama-3_1-Nemotron-51B-Instruct-GGUF:Q4_K_M hello
pulling manifest 
pulling fd11e8991106: 100% ▕██████████████████▏  31 GB                         
pulling 948af2743fc7: 100% ▕██████████████████▏ 1.5 KB                         
pulling 6c0b08d96525: 100% ▕██████████████████▏   65 B                         
pulling f9f3562e8a50: 100% ▕██████████████████▏  547 B                         
verifying sha256 digest 
writing manifest 
success 
Hello! How can I help you today?
<!-- gh-comment-id:3709262881 --> @rick-github commented on GitHub (Jan 5, 2026): ```console $ ollama -v ollama version is 0.13.5 $ ollama run hf.co/bartowski/Llama-3_1-Nemotron-51B-Instruct-GGUF:Q4_K_M hello pulling manifest pulling fd11e8991106: 100% ▕██████████████████▏ 31 GB pulling 948af2743fc7: 100% ▕██████████████████▏ 1.5 KB pulling 6c0b08d96525: 100% ▕██████████████████▏ 65 B pulling f9f3562e8a50: 100% ▕██████████████████▏ 547 B verifying sha256 digest writing manifest success Hello! How can I help you today? ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67497