[GH-ISSUE #6648] Llama 3.1 8B giving bad answers while Llama.cpp works well with the same model (Ollama on MacOS) #4184

Closed
opened 2026-04-12 15:06:54 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ea167 on GitHub (Sep 5, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6648

What is the issue?

Llama 3.1 8B replies bad answers to a simple information extraction, running "out-of-the-box" on Ollama Mac.
The same model running on Llama.cpp with seemingly the same parameters works well.

Attached is a markdown content from a website, that is provided to the ollama prompt along with
List all the associations mentioned in the markdown document above.

This test case is easy to reproduce. The 2 screenshots show the settings and the results.

If you run the same test with the same model on Llama.cpp, you get the list of 25 associations right.
Same on deepinfra.com.

I guess there must be some Ollama bug in the default settings or templates.

Hope it helps!

2024-09-04 at 6 52 PM

2024-09-04 at 6 56 PM

crawl_1_1.md

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

ollama version is 0.3.9

Originally created by @ea167 on GitHub (Sep 5, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6648 ### What is the issue? Llama 3.1 8B replies bad answers to a simple information extraction, running "out-of-the-box" on Ollama Mac. The same model running on Llama.cpp with seemingly the same parameters works well. Attached is a markdown content from a website, that is provided to the ollama prompt along with `List all the associations mentioned in the markdown document above.` This test case is easy to reproduce. The 2 screenshots show the settings and the results. If you run the same test with the same model on Llama.cpp, you get the list of 25 associations right. Same on deepinfra.com. I guess there must be some Ollama bug in the default settings or templates. Hope it helps! ![2024-09-04 at 6 52 PM](https://github.com/user-attachments/assets/b438bf0f-21af-4a70-bec1-e77daceb8ace) ![2024-09-04 at 6 56 PM](https://github.com/user-attachments/assets/6aeaa9e8-b4b3-4d5a-83d1-12965d6a8b24) [crawl_1_1.md](https://github.com/user-attachments/files/16882934/crawl_1_1.md) ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version ollama version is 0.3.9
GiteaMirror added the bug label 2026-04-12 15:06:54 -05:00
Author
Owner

@jmorganca commented on GitHub (Sep 5, 2024):

Hi @ea167. Thank you for the issue. I believe the issue in this case might be the context window. To extend it to say 16K, use /set num_ctx 16384 (or the num_ctx option in the API) In doing this I get the following results:

Here are the associations mentioned in the markdown document:

1. American Council of Engineering Companies of Arizona - ACEC Arizona
2. Apex GTS Advisors, Inc.
3. Arizona Bioindustry Association (AZBio)
4. Arizona Chamber of Commerce & Industry
5. Arizona Council of Human Service Providers
6. Arizona Hispanic Chamber of Commerce
7. Arizona Lodging & Tourism Association
8. Arizona Masonry Council
9. Arizona Municipal Water Users Association
10. Arizona Private School Association
11. Arizona Small Business Association
12. Arizona Tooling & Machining Association
13. AZ Impact for Good
14. Beer & Wine Distributors of Arizona
15. Better Business Bureau
16. Cambridge Investment Research Inc
17. Data Center Coalition
18. Greater Phoenix Chamber (Custom Premier)
19. One Community Media, LLC
20. Pharmaceutical Care Management Association
21. Phoenix AZ Ad Agency (Associate)
22. Phoenix REALTORS
23. The Thunderbirds
24. Arizona Bankers Association
25. Home Builders Association of Central Arizona

Sorry the context window is small by default. Extending it to a larger default is a work in progress. PRs such as @sammcj's for KV quantizations are step towards this https://github.com/ollama/ollama/pull/6279

<!-- gh-comment-id:2330468187 --> @jmorganca commented on GitHub (Sep 5, 2024): Hi @ea167. Thank you for the issue. I believe the issue in this case might be the context window. To extend it to say 16K, use `/set num_ctx 16384` (or the `num_ctx` option in the API) In doing this I get the following results: ``` Here are the associations mentioned in the markdown document: 1. American Council of Engineering Companies of Arizona - ACEC Arizona 2. Apex GTS Advisors, Inc. 3. Arizona Bioindustry Association (AZBio) 4. Arizona Chamber of Commerce & Industry 5. Arizona Council of Human Service Providers 6. Arizona Hispanic Chamber of Commerce 7. Arizona Lodging & Tourism Association 8. Arizona Masonry Council 9. Arizona Municipal Water Users Association 10. Arizona Private School Association 11. Arizona Small Business Association 12. Arizona Tooling & Machining Association 13. AZ Impact for Good 14. Beer & Wine Distributors of Arizona 15. Better Business Bureau 16. Cambridge Investment Research Inc 17. Data Center Coalition 18. Greater Phoenix Chamber (Custom Premier) 19. One Community Media, LLC 20. Pharmaceutical Care Management Association 21. Phoenix AZ Ad Agency (Associate) 22. Phoenix REALTORS 23. The Thunderbirds 24. Arizona Bankers Association 25. Home Builders Association of Central Arizona ``` Sorry the context window is small by default. Extending it to a larger default is a work in progress. PRs such as @sammcj's for KV quantizations are step towards this https://github.com/ollama/ollama/pull/6279
Author
Owner

@ea167 commented on GitHub (Sep 5, 2024):

Oh wow, thank you Jeffrey @jmorganca!

Indeed it works on my end too. Didn't know the context by default was different from the one of the model.

What's unintuitive is that /show info display the model context length as 128K.
I would advocate for the num_ctx to be set to the model context length by default.

>>> /show info
Model                                          
  	arch            	llama 	                         
  	parameters      	8.0B  	                         
  	quantization    	Q4_K_M	                         
  	context length  	131072	                         
  	embedding length	4096  
<!-- gh-comment-id:2330485868 --> @ea167 commented on GitHub (Sep 5, 2024): Oh wow, thank you Jeffrey @jmorganca! Indeed it works on my end too. Didn't know the context by default was different from the one of the model. What's unintuitive is that `/show info` display the model context length as 128K. I would advocate for the `num_ctx` to be **set to the model context length** by default. ``` >>> /show info Model arch llama parameters 8.0B quantization Q4_K_M context length 131072 embedding length 4096 ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4184