[GH-ISSUE #14058] rnj-1:8b early stop/message truncation #9185

Open
opened 2026-04-12 22:02:07 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @brendensoares on GitHub (Feb 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14058

What is the issue?

  • specific to rnj-1 model
  • works as expected when directly generated with llama.cpp v7923 CPU on test machine
  • issue appears to be between the ollama server and underlying llama.cpp

test prompt:

Output exactly this (including quotes): "http://localhost:5000\" then continue with 30 sentences.

Observed result:

"http://localhost:5

Oddly, if you modify the prompt from "30" to "3" the result changes, though still is incorrect.

prompt:

Output exactly this (including quotes): "http://localhost:5000\" then continue with 3 sentences.

result:

"http://localhost:5 then continue with 3 sentences.

I couldn't find any existing references to this kind of issue and hopefully the context here is enough for a dev to dig a little deeper into Ollama's inference logic.

Relevant log output

No relevant logs

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.15.2

Originally created by @brendensoares on GitHub (Feb 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14058 ### What is the issue? - specific to rnj-1 model - works as expected when directly generated with llama.cpp v7923 CPU on test machine - issue appears to be between the ollama server and underlying llama.cpp test prompt: ``` Output exactly this (including quotes): "http://localhost:5000\" then continue with 30 sentences. ``` Observed result: ``` "http://localhost:5 ``` Oddly, if you modify the prompt from "30" to "3" the result changes, though still is incorrect. prompt: ``` Output exactly this (including quotes): "http://localhost:5000\" then continue with 3 sentences. ``` result: ``` "http://localhost:5 then continue with 3 sentences. ``` I couldn't find any existing references to this kind of issue and hopefully the context here is enough for a dev to dig a little deeper into Ollama's inference logic. ### Relevant log output ```shell No relevant logs ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.15.2
GiteaMirror added the bug label 2026-04-12 22:02:07 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 3, 2026):

rnj-1 is implemented in the ollama engine, not the llama.cpp engine. Confirm that this affects all quants on Nvidia and ROCm platforms:

$ ollama run rnj-1:8b-instruct-q4_K_M 'Output exactly this (including quotes): "http://localhost:5000\" then continue with 30 sentences.'
"http://localhost:5

Logs indicate that a stop token was detected.

ollama  | time=2026-02-03T23:23:08.891Z level=TRACE source=runner.go:759 msg="computeBatch: vocab details" batchID=10 seqIdx=0 len(logits)=128256 len(activeBatch.batch.Outputs)=1 vocabSize=128256 iBatches=[0]
ollama  | time=2026-02-03T23:23:08.892Z level=TRACE source=bytepairencoding.go:270 msg=decoded string=: from=[25]
ollama  | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:657 msg="computeBatch: outputs are ready" batchID=10
ollama  | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:652 msg="computeBatch: inputs are ready" batchID=11
ollama  | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:725 msg="computeBatch: signaling computeStartedCh" batchID=11
ollama  | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:476 msg="forwardBatch compute started, setting up next batch" pendingBatch.id=11 id=12
ollama  | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:598 msg="forwardBatch iBatch" batchID=12 seqIdx=0 seq.iBatch=0 i+1=1 len(seq.inputs)=1
ollama  | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:474 msg="forwardBatch waiting for compute to start" pendingBatch.id=12
ollama  | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:650 msg="computeBatch: waiting for inputs to be ready" batchID=12
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:733 msg="computeBatch: logits ready" batchID=11
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:738 msg="computeBatch: decoding" batchID=11
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:759 msg="computeBatch: vocab details" batchID=11 seqIdx=0 len(logits)=128256 len(activeBatch.batch.Outputs)=1 vocabSize=128256 iBatches=[0]
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=bytepairencoding.go:270 msg=decoded string=5 from=[20]
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:657 msg="computeBatch: outputs are ready" batchID=11
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:652 msg="computeBatch: inputs are ready" batchID=12
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:725 msg="computeBatch: signaling computeStartedCh" batchID=12
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:476 msg="forwardBatch compute started, setting up next batch" pendingBatch.id=12 id=13
ollama  | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:598 msg="forwardBatch iBatch" batchID=13 seqIdx=0 seq.iBatch=0 i+1=1 len(seq.inputs)=1
ollama  | time=2026-02-03T23:23:08.901Z level=TRACE source=runner.go:474 msg="forwardBatch waiting for compute to start" pendingBatch.id=13
ollama  | time=2026-02-03T23:23:08.901Z level=TRACE source=runner.go:650 msg="computeBatch: waiting for inputs to be ready" batchID=13
ollama  | time=2026-02-03T23:23:08.909Z level=TRACE source=runner.go:733 msg="computeBatch: logits ready" batchID=12
ollama  | time=2026-02-03T23:23:08.909Z level=TRACE source=runner.go:738 msg="computeBatch: decoding" batchID=12
ollama  | time=2026-02-03T23:23:08.909Z level=TRACE source=runner.go:759 msg="computeBatch: vocab details" batchID=12 seqIdx=0 len(logits)=128256 len(activeBatch.batch.Outputs)=1 vocabSize=128256 iBatches=[0]
ollama  | time=2026-02-03T23:23:08.909Z level=TRACE source=bytepairencoding.go:270 msg=decoded string=<|eot_id|> from=[128009]
ollama  | time=2026-02-03T23:23:08.909Z level=DEBUG source=runner.go:793 msg="hit stop token" pending=[<|eot_id|>] stop=<|eot_id|>
<!-- gh-comment-id:3844355827 --> @rick-github commented on GitHub (Feb 3, 2026): rnj-1 is implemented in the ollama engine, not the llama.cpp engine. Confirm that this affects all quants on Nvidia and ROCm platforms: ```console $ ollama run rnj-1:8b-instruct-q4_K_M 'Output exactly this (including quotes): "http://localhost:5000\" then continue with 30 sentences.' "http://localhost:5 ``` Logs indicate that a stop token was detected. ``` ollama | time=2026-02-03T23:23:08.891Z level=TRACE source=runner.go:759 msg="computeBatch: vocab details" batchID=10 seqIdx=0 len(logits)=128256 len(activeBatch.batch.Outputs)=1 vocabSize=128256 iBatches=[0] ollama | time=2026-02-03T23:23:08.892Z level=TRACE source=bytepairencoding.go:270 msg=decoded string=: from=[25] ollama | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:657 msg="computeBatch: outputs are ready" batchID=10 ollama | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:652 msg="computeBatch: inputs are ready" batchID=11 ollama | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:725 msg="computeBatch: signaling computeStartedCh" batchID=11 ollama | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:476 msg="forwardBatch compute started, setting up next batch" pendingBatch.id=11 id=12 ollama | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:598 msg="forwardBatch iBatch" batchID=12 seqIdx=0 seq.iBatch=0 i+1=1 len(seq.inputs)=1 ollama | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:474 msg="forwardBatch waiting for compute to start" pendingBatch.id=12 ollama | time=2026-02-03T23:23:08.892Z level=TRACE source=runner.go:650 msg="computeBatch: waiting for inputs to be ready" batchID=12 ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:733 msg="computeBatch: logits ready" batchID=11 ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:738 msg="computeBatch: decoding" batchID=11 ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:759 msg="computeBatch: vocab details" batchID=11 seqIdx=0 len(logits)=128256 len(activeBatch.batch.Outputs)=1 vocabSize=128256 iBatches=[0] ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=bytepairencoding.go:270 msg=decoded string=5 from=[20] ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:657 msg="computeBatch: outputs are ready" batchID=11 ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:652 msg="computeBatch: inputs are ready" batchID=12 ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:725 msg="computeBatch: signaling computeStartedCh" batchID=12 ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:476 msg="forwardBatch compute started, setting up next batch" pendingBatch.id=12 id=13 ollama | time=2026-02-03T23:23:08.900Z level=TRACE source=runner.go:598 msg="forwardBatch iBatch" batchID=13 seqIdx=0 seq.iBatch=0 i+1=1 len(seq.inputs)=1 ollama | time=2026-02-03T23:23:08.901Z level=TRACE source=runner.go:474 msg="forwardBatch waiting for compute to start" pendingBatch.id=13 ollama | time=2026-02-03T23:23:08.901Z level=TRACE source=runner.go:650 msg="computeBatch: waiting for inputs to be ready" batchID=13 ollama | time=2026-02-03T23:23:08.909Z level=TRACE source=runner.go:733 msg="computeBatch: logits ready" batchID=12 ollama | time=2026-02-03T23:23:08.909Z level=TRACE source=runner.go:738 msg="computeBatch: decoding" batchID=12 ollama | time=2026-02-03T23:23:08.909Z level=TRACE source=runner.go:759 msg="computeBatch: vocab details" batchID=12 seqIdx=0 len(logits)=128256 len(activeBatch.batch.Outputs)=1 vocabSize=128256 iBatches=[0] ollama | time=2026-02-03T23:23:08.909Z level=TRACE source=bytepairencoding.go:270 msg=decoded string=<|eot_id|> from=[128009] ollama | time=2026-02-03T23:23:08.909Z level=DEBUG source=runner.go:793 msg="hit stop token" pending=[<|eot_id|>] stop=<|eot_id|> ```
Author
Owner

@brendensoares commented on GitHub (Feb 4, 2026):

Thanks for verifying.

Just for clarity, I thought ollama used llama.cpp for inference. What is the "ollama engine" in comparison to llama.cpp?

<!-- gh-comment-id:3845679638 --> @brendensoares commented on GitHub (Feb 4, 2026): Thanks for verifying. Just for clarity, I thought ollama used llama.cpp for inference. What is the "ollama engine" in comparison to llama.cpp?
Author
Owner

@rick-github commented on GitHub (Feb 4, 2026):

Ollama is a framework for managing low level inference engines. The original engine was llama.cpp, and then an inference engine based on Go was developed, which is mostly used for newer models. The developers are now working on integrating an inference engine using MLX to enable broader types of models (eg, diffusion).

<!-- gh-comment-id:3846814795 --> @rick-github commented on GitHub (Feb 4, 2026): Ollama is a framework for managing low level inference engines. The original engine was llama.cpp, and then an inference engine based on Go was developed, which is mostly used for newer models. The developers are now working on integrating an inference engine using MLX to enable broader types of models (eg, diffusion).
Author
Owner

@brendensoares commented on GitHub (Feb 4, 2026):

Where can I learn more about this golang based engine? Why does it matter which engine a model runs on?

FYI when testing I grabbed the model downloaded by ollama and ran it with llama.cpp from the CLI. So it seems they both use the same GGUF format still.

<!-- gh-comment-id:3849655812 --> @brendensoares commented on GitHub (Feb 4, 2026): Where can I learn more about this golang based engine? Why does it matter which engine a model runs on? FYI when testing I grabbed the model downloaded by ollama and ran it with llama.cpp from the CLI. So it seems they both use the same GGUF format still.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9185