[GH-ISSUE #6638] Llama 3.1 8b not generating answers since past few days #4179

Closed
opened 2026-04-12 15:06:31 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @ToshiKBhat on GitHub (Sep 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6638

Originally assigned to: @jmorganca on GitHub.

What is the issue?

The llama 3.1 8b model was generating answers in my RAG app until a few days back. Now it says i cannot help with that even when i use a simple system prompt - you are a helpful assistant , use the context provided to you to answer the user questions.
The 70b model seems to work fine, I also noticed the 8b model was updated recently.

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

No response

Originally created by @ToshiKBhat on GitHub (Sep 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6638 Originally assigned to: @jmorganca on GitHub. ### What is the issue? The llama 3.1 8b model was generating answers in my RAG app until a few days back. Now it says i cannot help with that even when i use a simple system prompt - you are a helpful assistant , use the context provided to you to answer the user questions. The 70b model seems to work fine, I also noticed the 8b model was updated recently. ### OS Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 15:06:31 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 4, 2024):

Server logs will help in debugging.

<!-- gh-comment-id:2329642881 --> @rick-github commented on GitHub (Sep 4, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging.
Author
Owner

@ea167 commented on GitHub (Sep 5, 2024):

Same here. I'll make a separate issue with test case

<!-- gh-comment-id:2330450627 --> @ea167 commented on GitHub (Sep 5, 2024): Same here. I'll make a separate issue with test case
Author
Owner

@jmorganca commented on GitHub (Sep 5, 2024):

Hi folks, what size of a prompt are you providing? Would it be possible to provide an example prompt that reproduces this? Thanks so much, and sorry this happened.

<!-- gh-comment-id:2332172814 --> @jmorganca commented on GitHub (Sep 5, 2024): Hi folks, what size of a prompt are you providing? Would it be possible to provide an example prompt that reproduces this? Thanks so much, and sorry this happened.
Author
Owner

@ToshiKBhat commented on GitHub (Sep 5, 2024):

Example prompt: you are a helpful assistant, you have to answer user’s questions using the context provided to you and nothing else.

I have removed the nothing else part and tried as well.
I thought maybe i am providing too much context and reduced it but it is just equivalent to 1-2 pages of a pdf, well under the 128k context limit.

<!-- gh-comment-id:2332439290 --> @ToshiKBhat commented on GitHub (Sep 5, 2024): Example prompt: you are a helpful assistant, you have to answer user’s questions using the context provided to you and nothing else. I have removed the nothing else part and tried as well. I thought maybe i am providing too much context and reduced it but it is just equivalent to 1-2 pages of a pdf, well under the 128k context limit.
Author
Owner

@ToshiKBhat commented on GitHub (Sep 5, 2024):

The response seems to be consistent - I can’t assist with that request. I suggest reaching out to

<!-- gh-comment-id:2332443519 --> @ToshiKBhat commented on GitHub (Sep 5, 2024): The response seems to be consistent - I can’t assist with that request. I suggest reaching out to <the support team mentioned in RAG documents>
Author
Owner

@ToshiKBhat commented on GitHub (Sep 5, 2024):

I set the num_ctx option to high based on the solution you provided on another issue and can see answer generation now.

was the default num_ctx changed recently? If so how can I find out what the recent change was ?

Thanks for providing the solution @jmorganca , really appreciate it.

<!-- gh-comment-id:2332467840 --> @ToshiKBhat commented on GitHub (Sep 5, 2024): I set the num_ctx option to high based on the solution you provided on another issue and can see answer generation now. was the default num_ctx changed recently? If so how can I find out what the recent change was ? Thanks for providing the solution @jmorganca , really appreciate it.
Author
Owner

@JonathanGuo01 commented on GitHub (Sep 6, 2024):

I had the same problem yesterday on my RAG system, which led me to think that I shouldn't have updated to the latest version of ollama.

<!-- gh-comment-id:2333074523 --> @JonathanGuo01 commented on GitHub (Sep 6, 2024): I had the same problem yesterday on my RAG system, which led me to think that I shouldn't have updated to the latest version of ollama.
Author
Owner

@antonkoenig commented on GitHub (Sep 10, 2024):

I see frequent crashes for ollama (pulled 9 days ago) in general. The webui sometimes reports server error 500. Most of the time it just appears stuck. I don't have a log of the error.

<!-- gh-comment-id:2339374377 --> @antonkoenig commented on GitHub (Sep 10, 2024): I see frequent crashes for ollama (pulled 9 days ago) in general. The webui sometimes reports server error 500. Most of the time it just appears stuck. I don't have a log of the error.
Author
Owner

@jmorganca commented on GitHub (Nov 17, 2024):

I will close this as this I believe this is from having too small of a num_ctx (which is captured in other issues). Thanks again for the issue

<!-- gh-comment-id:2481078233 --> @jmorganca commented on GitHub (Nov 17, 2024): I will close this as this I believe this is from having too small of a `num_ctx` (which is captured in other issues). Thanks again for the issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4179