[GH-ISSUE #11172] Long prompt causes generate API to misbehave #7368

Open
opened 2026-04-12 19:25:42 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @dhiltgen on GitHub (Jun 23, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11172

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

If you send an extremely long prompt that exceeds the context size, the generate API doesn't behave properly. If you set stream=false you never get a generate response callback. With streaming enabled, you do get a series of response callbacks, however the final response has Done==false and no timing information.

This can be seen via CLI with something like

% ollama run --verbose llama3.2:latest < ~/Documents/shakespeare.txt

which will produce some output, but never report the verbose timing information.

Relevant log output

time=2025-06-23T11:20:47.693-07:00 level=WARN source=runner.go:128 msg="truncating input prompt" limit=4096 prompt=1474469 keep=5 new=4096
time=2025-06-23T11:20:47.694-07:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=4096 used=0 remaining=4096
time=2025-06-23T11:20:50.738-07:00 level=DEBUG source=cache.go:240 msg="context limit hit - shifting" id=0 limit=4096 input=4096 keep=5 discard=2045
update: applying K-shift
time=2025-06-23T11:20:57.402-07:00 level=DEBUG source=sched.go:503 msg="context for request finished"
time=2025-06-23T11:20:57.402-07:00 level=INFO source=routes.go:1377 msg="streamResponse: w.Write failed with write tcp 127.0.0.1:11434->127.0.0.1:60644: write: connection reset by peer"
time=2025-06-23T11:20:57.402-07:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/llama3.2:latest runner.inference=metal runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=2 runner.pid=20549 runner.model=/Users/daniel/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff runner.num_ctx=8192 duration=5m0s
[GIN] 2025/06/23 - 11:20:57 | 200 | 23.619543916s |       127.0.0.1 | POST     "/api/generate"

OS

No response

GPU

No response

CPU

No response

Ollama version

0.9.2

Originally created by @dhiltgen on GitHub (Jun 23, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11172 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? If you send an extremely long prompt that exceeds the context size, the generate API doesn't behave properly. If you set `stream=false` you never get a generate response callback. With streaming enabled, you do get a series of response callbacks, however the final response has Done==false and no timing information. This can be seen via CLI with something like ``` % ollama run --verbose llama3.2:latest < ~/Documents/shakespeare.txt ``` which will produce some output, but never report the verbose timing information. ### Relevant log output ```shell time=2025-06-23T11:20:47.693-07:00 level=WARN source=runner.go:128 msg="truncating input prompt" limit=4096 prompt=1474469 keep=5 new=4096 time=2025-06-23T11:20:47.694-07:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=4096 used=0 remaining=4096 time=2025-06-23T11:20:50.738-07:00 level=DEBUG source=cache.go:240 msg="context limit hit - shifting" id=0 limit=4096 input=4096 keep=5 discard=2045 update: applying K-shift time=2025-06-23T11:20:57.402-07:00 level=DEBUG source=sched.go:503 msg="context for request finished" time=2025-06-23T11:20:57.402-07:00 level=INFO source=routes.go:1377 msg="streamResponse: w.Write failed with write tcp 127.0.0.1:11434->127.0.0.1:60644: write: connection reset by peer" time=2025-06-23T11:20:57.402-07:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/llama3.2:latest runner.inference=metal runner.devices=1 runner.size="3.8 GiB" runner.vram="3.8 GiB" runner.parallel=2 runner.pid=20549 runner.model=/Users/daniel/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff runner.num_ctx=8192 duration=5m0s [GIN] 2025/06/23 - 11:20:57 | 200 | 23.619543916s | 127.0.0.1 | POST "/api/generate" ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.9.2
GiteaMirror added the bug label 2026-04-12 19:25:42 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7368