[GH-ISSUE #13509] nemotron-3-nano loosing chat history #8905

Closed
opened 2026-04-12 21:42:24 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @Digit-al on GitHub (Dec 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13509

What is the issue?

When using nemotron, it sometime (often) loose the chat history.
I am using it through open-webui but the DEBUG in open-webui shows the history is passed to ollama, so I guess there is a problem with the parser.
See the example below.

chat_paris_weather.txt

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.13.4

Originally created by @Digit-al on GitHub (Dec 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13509 ### What is the issue? When using nemotron, it sometime (often) loose the chat history. I am using it through open-webui but the DEBUG in open-webui shows the history is passed to ollama, so I guess there is a problem with the parser. See the example below. [chat_paris_weather.txt](https://github.com/user-attachments/files/24211786/chat_paris_weather.txt) ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.13.4
GiteaMirror added the bug label 2026-04-12 21:42:24 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 17, 2025):

Increase the size of the context window.

<!-- gh-comment-id:3665144031 --> @rick-github commented on GitHub (Dec 17, 2025): Increase the size of the [context window](https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-can-i-specify-the-context-window-size).
Author
Owner

@Digit-al commented on GitHub (Dec 17, 2025):

That's odd, the num_ctx was 16384 so I thought it was enough for such a "small" exchange, wasn't it ?
I can try bigger context size but with Q4_K_M, it doesn't fit in my VRAM (1% on CPU with 16384 leading to approx. 57t/s and 5% on CPU with 32768 decreasing t/s to 23!).
32768 makes it, but will you consider proposing Q4_K_XL like the unsloth ones so all fits into 24GB VRAM ?

<!-- gh-comment-id:3665362038 --> @Digit-al commented on GitHub (Dec 17, 2025): That's odd, the num_ctx was 16384 so I thought it was enough for such a "small" exchange, wasn't it ? I can try bigger context size but with Q4_K_M, it doesn't fit in my VRAM (1% on CPU with 16384 leading to approx. 57t/s and 5% on CPU with 32768 decreasing t/s to 23!). 32768 makes it, but will you consider proposing Q4_K_XL like the unsloth ones so all fits into 24GB VRAM ?
Author
Owner

@rick-github commented on GitHub (Dec 17, 2025):

Just the response from the search_web tool call is ~21000 tokens.

If you want to use a Q4_K_XL quant:

ollama pull hf.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF:Q4_K_XL
echo FROM hf.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF:Q4_K_XL > Modelfile
ollama show --modelfile nemotron-3-nano:latest | grep -v "^FROM " >> Modelfile
ollama create nemotron-3-nano:30b-a3b-q4_K_XL
<!-- gh-comment-id:3667548532 --> @rick-github commented on GitHub (Dec 17, 2025): Just the response from the `search_web` tool call is ~21000 tokens. If you want to use a Q4_K_XL quant: ``` ollama pull hf.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF:Q4_K_XL echo FROM hf.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF:Q4_K_XL > Modelfile ollama show --modelfile nemotron-3-nano:latest | grep -v "^FROM " >> Modelfile ollama create nemotron-3-nano:30b-a3b-q4_K_XL ```
Author
Owner

@ParthSareen commented on GitHub (Dec 18, 2025):

@Digit-al I'd recommend doing some context engineering work for tools in order to only have the most useful (you pick what that means in your setting) information being passed in to the model. The most simple would be just to truncate the tool output.

<!-- gh-comment-id:3668726528 --> @ParthSareen commented on GitHub (Dec 18, 2025): @Digit-al I'd recommend doing some context engineering work for tools in order to only have the most useful (you pick what that means in your setting) information being passed in to the model. The most simple would be just to truncate the tool output.
Author
Owner

@Digit-al commented on GitHub (Dec 18, 2025):

Thanks a lot for your help and suggestions!

<!-- gh-comment-id:3672625751 --> @Digit-al commented on GitHub (Dec 18, 2025): Thanks a lot for your help and suggestions!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8905