[GH-ISSUE #14882] "panic: failed to sample token" with "nemotron-3-nano" #71653

Open
opened 2026-05-05 02:17:16 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @marianopeck on GitHub (Mar 16, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14882

What is the issue?

Hi everyone,

I am trying to call the Responses API to the Ollama server with the model "nemotron-3-nano". But I am getting a crash. I am running Ollama in debug mode. I can attach the whole server.log is useful.

Important to note that same Responses API request (and same code) works great with many other models.

Also interesting is that ollama run nemotron-3-nano does work. It seems to fail only with my Responses API call.

The response I get back in my client is:

'{"error":{"message":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details","type":"api_error","param":null,"code":null}}
'

Relevant log output

panic: failed to sample token

goroutine 567 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0x140002330e0, {0x5, {0x105bce850, 0x14000855100}, {0x105bdbdb0, 0x14005ed92d8}, {0x14005e88c08, 0x1de, 0x25f}, {{0x105bdbdb0, ...}, ...}, ...})
	/Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:762 +0x1668
created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 10
	/Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:459 +0x22c
time=2026-03-16T16:50:07.497-03:00 level=ERROR source=server.go:1610 msg="post predict" error="Post \"http://127.0.0.1:51411/completion\": EOF"
[GIN] 2026/03/16 - 16:50:07 | 500 | 12.603315458s |     10.211.55.3 | POST     "/v1/responses"
time=2026-03-16T16:50:07.497-03:00 level=DEBUG source=sched.go:585 msg="context for request finished"
time=2026-03-16T16:50:07.498-03:00 level=DEBUG source=sched.go:336 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nemotron-3-nano:30b runner.inference="[{ID:0 Library:Metal}]" runner.size="26.0 GiB" runner.vram="26.0 GiB" runner.parallel=1 runner.pid=66967 runner.model=/Users/mariano/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce runner.num_ctx=262144 duration=5m0s
time=2026-03-16T16:50:07.498-03:00 level=DEBUG source=sched.go:354 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nemotron-3-nano:30b runner.inference="[{ID:0 Library:Metal}]" runner.size="26.0 GiB" runner.vram="26.0 GiB" runner.parallel=1 runner.pid=66967 runner.model=/Users/mariano/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce runner.num_ctx=262144 refCount=0

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.18.0

Originally created by @marianopeck on GitHub (Mar 16, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14882 ### What is the issue? Hi everyone, I am trying to call the Responses API to the Ollama server with the model "nemotron-3-nano". But I am getting a crash. I am running Ollama in debug mode. I can attach the whole `server.log` is useful. Important to note that same Responses API request (and same code) works great with many other models. Also interesting is that `ollama run nemotron-3-nano` does work. It seems to fail only with my Responses API call. The response I get back in my client is: ``` '{"error":{"message":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details","type":"api_error","param":null,"code":null}} ' ``` ### Relevant log output ```shell panic: failed to sample token goroutine 567 [running]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0x140002330e0, {0x5, {0x105bce850, 0x14000855100}, {0x105bdbdb0, 0x14005ed92d8}, {0x14005e88c08, 0x1de, 0x25f}, {{0x105bdbdb0, ...}, ...}, ...}) /Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:762 +0x1668 created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 10 /Users/runner/work/ollama/ollama/runner/ollamarunner/runner.go:459 +0x22c time=2026-03-16T16:50:07.497-03:00 level=ERROR source=server.go:1610 msg="post predict" error="Post \"http://127.0.0.1:51411/completion\": EOF" [GIN] 2026/03/16 - 16:50:07 | 500 | 12.603315458s | 10.211.55.3 | POST "/v1/responses" time=2026-03-16T16:50:07.497-03:00 level=DEBUG source=sched.go:585 msg="context for request finished" time=2026-03-16T16:50:07.498-03:00 level=DEBUG source=sched.go:336 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/nemotron-3-nano:30b runner.inference="[{ID:0 Library:Metal}]" runner.size="26.0 GiB" runner.vram="26.0 GiB" runner.parallel=1 runner.pid=66967 runner.model=/Users/mariano/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce runner.num_ctx=262144 duration=5m0s time=2026-03-16T16:50:07.498-03:00 level=DEBUG source=sched.go:354 msg="after processing request finished event" runner.name=registry.ollama.ai/library/nemotron-3-nano:30b runner.inference="[{ID:0 Library:Metal}]" runner.size="26.0 GiB" runner.vram="26.0 GiB" runner.parallel=1 runner.pid=66967 runner.model=/Users/mariano/.ollama/models/blobs/sha256-a70437c41b3b0b768c48737e15f8160c90f13dc963f5226aabb3a160f708d1ce runner.num_ctx=262144 refCount=0 ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.18.0
GiteaMirror added the bug label 2026-05-05 02:17:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71653