[GH-ISSUE #7341] Ollama forgets previous information in conversation if a prompt sent by the user is very large #51175

Closed
opened 2026-04-28 18:52:34 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @robotom on GitHub (Oct 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7341

What is the issue?

I am trying to figure out how to pass large bodies of text (or documents) to Ollama. I thought this was just an issue with my front end implementation, but it seems present in the command line interface as well.

I could have a basic conversation with the model about the weather (llama 3.2 in this case) and then if I give it something relatively long (but not really) like a chapter of a book, it completely loses track of everything else that came earlier. In fact, it doesn't even remember the prompt I just sent, only its own response to my latest prompt.

Is there a fix for this? Some setting? Or is it a RAM/Hardware issue or something? Seems like I won't be able to pass any document to it at all if this is a limitation.

Cheers.

Specs:

CPU: Intel 24 core
RAM: 64GB
GPU: 4060Ti (16GB)
Ollama version: 0.3.12

Originally created by @robotom on GitHub (Oct 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7341 ### What is the issue? I am trying to figure out how to pass large bodies of text (or documents) to Ollama. I thought this was just an issue with my front end implementation, but it seems present in the command line interface as well. I could have a basic conversation with the model about the weather (llama 3.2 in this case) and then if I give it something relatively long (but not really) like a chapter of a book, it completely loses track of everything else that came earlier. In fact, it doesn't even remember the prompt I just sent, only its own response to my latest prompt. Is there a fix for this? Some setting? Or is it a RAM/Hardware issue or something? Seems like I won't be able to pass any document to it at all if this is a limitation. Cheers. ### Specs: CPU: Intel 24 core RAM: 64GB GPU: 4060Ti (16GB) Ollama version: 0.3.12
GiteaMirror added the bug label 2026-04-28 18:52:34 -05:00
Author
Owner
<!-- gh-comment-id:2434817439 --> @rick-github commented on GitHub (Oct 24, 2024): https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size
Author
Owner

@pdevine commented on GitHub (Oct 24, 2024):

@robotom please check out @rick-github 's answer above. You'll need to adjust the context size.

<!-- gh-comment-id:2435687164 --> @pdevine commented on GitHub (Oct 24, 2024): @robotom please check out @rick-github 's answer above. You'll need to adjust the context size.
Author
Owner

@robotom commented on GitHub (Oct 24, 2024):

@robotom please check out @rick-github 's answer above. You'll need to adjust the context size.
https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size

Thanks for your replies. Default is 2048 it seems. Is the max recommended 4096? Are there any larger numbers, etc. Is there any data on this relationship to CPU/RAM, etc?

Documentation on this is very clear (thank you for pointing me to it), just not extensive as far as I can tell.

<!-- gh-comment-id:2436497715 --> @robotom commented on GitHub (Oct 24, 2024): > @robotom please check out @rick-github 's answer above. You'll need to adjust the context size. > https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size Thanks for your replies. Default is 2048 it seems. Is the max recommended 4096? Are there any larger numbers, etc. Is there any data on this relationship to CPU/RAM, etc? Documentation on this is very clear (thank you for pointing me to it), just not extensive as far as I can tell.
Author
Owner

@rick-github commented on GitHub (Oct 24, 2024):

RAM usage will increase as context window size increases (https://github.com/ollama/ollama/issues/6852#issuecomment-2382909355). If you are want to only run on a GPU, that means your context size is limited by the amount of VRAM you have. If you are running on CPU, you can go pretty large depending on your ability to add swap. If you run on GPU and exceed the size of the VRAM, the model will partly spill to system RAM - it will work, just slower. A model has a maximum context size, if you exceed that that model response from the model will lose coherence. You can see this maximum value with ollama show <model>, look for context length.

<!-- gh-comment-id:2436504435 --> @rick-github commented on GitHub (Oct 24, 2024): RAM usage will increase as context window size increases (https://github.com/ollama/ollama/issues/6852#issuecomment-2382909355). If you are want to only run on a GPU, that means your context size is limited by the amount of VRAM you have. If you are running on CPU, you can go pretty large depending on your ability to add swap. If you run on GPU and exceed the size of the VRAM, the model will partly spill to system RAM - it will work, just slower. A model has a maximum context size, if you exceed that that model response from the model will lose coherence. You can see this maximum value with `ollama show <model>`, look for `context length`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51175