[GH-ISSUE #9915] nvidia/Llama-3_3-Nemotron-Super-49B-v1 #32252

Closed
opened 2026-04-22 13:20:35 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @dpk-it on GitHub (Mar 20, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9915

https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1

Originally created by @dpk-it on GitHub (Mar 20, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9915 https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1
GiteaMirror added the model label 2026-04-22 13:20:35 -05:00
Author
Owner

@mmb78 commented on GitHub (Mar 20, 2025):

just tried this (does not work):

ollama run hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q4_K_M
Error: Post "http://127.0.0.1:11434/api/generate": EOF

ollama show hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q4_K_M
Model
architecture deci
parameters 49.9B
context length 131072
embedding length 8192
quantization unknown

Parameters
stop "<|im_start|>"
stop "<|im_end|>"

<!-- gh-comment-id:2741549329 --> @mmb78 commented on GitHub (Mar 20, 2025): just tried this (does not work): ollama run hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q4_K_M Error: Post "http://127.0.0.1:11434/api/generate": EOF ollama show hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q4_K_M Model architecture deci parameters 49.9B context length 131072 embedding length 8192 quantization unknown Parameters stop "<|im_start|>" stop "<|im_end|>"
Author
Owner

@dpk-it commented on GitHub (Mar 20, 2025):

@mmb78 I tried several quantization options, it crashes ollama on my system during model initialization https://github.com/ollama/ollama/issues/9911 .. other models work fine

<!-- gh-comment-id:2741587837 --> @dpk-it commented on GitHub (Mar 20, 2025): @mmb78 I tried several quantization options, it crashes ollama on my system during model initialization https://github.com/ollama/ollama/issues/9911 .. other models work fine
Author
Owner

@nickheyer commented on GitHub (Mar 21, 2025):

interface conversion: interface {} is *ggml.array, not uint32

<!-- gh-comment-id:2741957953 --> @nickheyer commented on GitHub (Mar 21, 2025): interface conversion: interface {} is *ggml.array, not uint32
Author
Owner

@charliboy commented on GitHub (Mar 21, 2025):

Ollmama 0.6.0 version has the same issue,

<!-- gh-comment-id:2742539270 --> @charliboy commented on GitHub (Mar 21, 2025): Ollmama 0.6.0 version has the same issue,
Author
Owner

@Tanote650 commented on GitHub (Mar 21, 2025):

Already 2 months ago an Ollama update for the Nemotron models was worked out and made available, see post #8460. Why this has not yet been considered in one of the updates of Ollama I can not say.

<!-- gh-comment-id:2743003613 --> @Tanote650 commented on GitHub (Mar 21, 2025): Already 2 months ago an Ollama update for the Nemotron models was worked out and made available, see post #8460. Why this has not yet been considered in one of the updates of Ollama I can not say.
Author
Owner

@JeroenAdam commented on GitHub (Mar 21, 2025):

I tried this and Ollmama v0.6.2 crashes, although works fine with llama.cpp b4856 from two weeks ago.

panic: interface conversion: interface {} is *ggml.array, not uint32

goroutine 16 [running]:
github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0002f2ff0, {0x55e8beea7d4d, 0x14}, {0xc0005d6b98, 0x1, 0x0})
github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de
github.com/ollama/ollama/fs/ggml.KV.Uint(...)
github.com/ollama/ollama/fs/ggml/ggml.go:96
github.com/ollama/ollama/fs/ggml.KV.HeadCount(...)
github.com/ollama/ollama/fs/ggml/ggml.go:56
github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCount(0xc0002f2ff0)
github.com/ollama/ollama/fs/ggml/ggml.go:64 +0x5e
github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCountK(0xc0002f2ff0)
github.com/ollama/ollama/fs/ggml/ggml.go:72 +0x18
github.com/ollama/ollama/fs/ggml.GGML.SupportsFlashAttention({{0x55e8bf31b0e8?, 0xc0006f0f50?}, {0x55e8bf31b098?, 0xc000489008?}})
github.com/ollama/ollama/fs/ggml/ggml.go:648 +0x159
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, , {, _, _}, {{0x20000, 0x200, 0xffffffffffffffff, ...}, ...})
github.com/ollama/ollama/llm/memory.go:133 +0x568
github.com/ollama/ollama/llm.PredictServerFit({0xc000785ba8?, 0x55e8be067de5?, 0xc0007858c0?}, 0xc00015a020, {0xc000785918?, _, _}, {0x0, 0x0, 0x0}, ...)
github.com/ollama/ollama/llm/memory.go:23 +0xbd
github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc00004c0f0, 0xc00015a020, {0xc0001d0240?, 0x1?, 0x1?}, 0xc000045cf8)
github.com/ollama/ollama/server/sched.go:714 +0x6f3
github.com/ollama/ollama/server.(*Scheduler).processPending(0xc0001a91a0, {0x55e8bf31f020, 0xc0003b2550})
github.com/ollama/ollama/server/sched.go:226 +0xe6b
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
github.com/ollama/ollama/server/sched.go:108 +0x1f
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
github.com/ollama/ollama/server/sched.go:107 +0xb1

<!-- gh-comment-id:2743838664 --> @JeroenAdam commented on GitHub (Mar 21, 2025): I tried this and Ollmama v0.6.2 crashes, although works fine with llama.cpp b4856 from two weeks ago. panic: interface conversion: interface {} is *ggml.array, not uint32 goroutine 16 [running]: github.com/ollama/ollama/fs/ggml.keyValue[...](0xc0002f2ff0, {0x55e8beea7d4d, 0x14}, {0xc0005d6b98, 0x1, 0x0}) github.com/ollama/ollama/fs/ggml/ggml.go:146 +0x2de github.com/ollama/ollama/fs/ggml.KV.Uint(...) github.com/ollama/ollama/fs/ggml/ggml.go:96 github.com/ollama/ollama/fs/ggml.KV.HeadCount(...) github.com/ollama/ollama/fs/ggml/ggml.go:56 github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCount(0xc0002f2ff0) github.com/ollama/ollama/fs/ggml/ggml.go:64 +0x5e github.com/ollama/ollama/fs/ggml.KV.EmbeddingHeadCountK(0xc0002f2ff0) github.com/ollama/ollama/fs/ggml/ggml.go:72 +0x18 github.com/ollama/ollama/fs/ggml.GGML.SupportsFlashAttention({{0x55e8bf31b0e8?, 0xc0006f0f50?}, {0x55e8bf31b098?, 0xc000489008?}}) github.com/ollama/ollama/fs/ggml/ggml.go:648 +0x159 github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, _, {_, _, _}, {{0x20000, 0x200, 0xffffffffffffffff, ...}, ...}) github.com/ollama/ollama/llm/memory.go:133 +0x568 github.com/ollama/ollama/llm.PredictServerFit({0xc000785ba8?, 0x55e8be067de5?, 0xc0007858c0?}, 0xc00015a020, {0xc000785918?, _, _}, {0x0, 0x0, 0x0}, ...) github.com/ollama/ollama/llm/memory.go:23 +0xbd github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc00004c0f0, 0xc00015a020, {0xc0001d0240?, 0x1?, 0x1?}, 0xc000045cf8) github.com/ollama/ollama/server/sched.go:714 +0x6f3 github.com/ollama/ollama/server.(*Scheduler).processPending(0xc0001a91a0, {0x55e8bf31f020, 0xc0003b2550}) github.com/ollama/ollama/server/sched.go:226 +0xe6b github.com/ollama/ollama/server.(*Scheduler).Run.func1() github.com/ollama/ollama/server/sched.go:108 +0x1f created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1 github.com/ollama/ollama/server/sched.go:107 +0xb1
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32252