[GH-ISSUE #6883] Problem Executing 'ollama create' Multiple Times with Different GGUF Files #30113

Closed
opened 2026-04-22 09:35:12 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @michaelc2005 on GitHub (Sep 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6883

What is the issue?

(I have done some searching and as of yet not found any mention of this issue, but I may have missed it.)

When creating models from GGUF files downloaded from Hugging Face, I observed that two different models, when tested with an identical prompt (copied and pasted), produced nearly similar responses but not word-for-word. Then, upon switching between models while watching the System Monitor and amdgpu_top, I noticed that the system memory usage remained unchanged, and the new model loaded almost instantly. This swift loading initially sparked excitement until closer scrutiny revealed the underlying issue.

I suspected something wrong. Running 'ollama list' showed that both models had identical IDs. A deeper investigation by loading each model and using '/show modelfile' indicated that the same blob file was being used, despite their distinct GGUF files in their respective modelfiles. However, a peculiar observation is that while the model files were nearly identical (besides sloppy formatting), including the 'seed' parameter, the responses differed slightly. The only significant difference between the files was in the 'FROM' parameter.

I acknowledge that my investigation may be incomplete or flawed due to a lack of diligence and evidence. Moreover, I apologize if any terminology used is incorrect or confusing. That said, I am merely an AI Hobbyist. My experience dates back to 1979 when I learned of the Eliza 'Therapist' chatbot, and I have been tinkering with AI on and off ever since. Although I possess programming skills in various languages, including Python, my expertise is limited. And, my reluctance to document processes has also hindered me a few times over the years. Sorry.

The two GGUF files were:

  1. Llama-3.1-Storm-8B.Q8_0.gguf
  2. Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf

Other important info:

Laptop: Lenovo Flex 5-14ARE05 Laptop (AMD Ryzen 5 4500U with integrated Radeon Graphics)
OS: Ubuntu 24.04.1 LTS fully updated as of this morning (9/19/2024)
Ollama version is 0.3.11

My current workaround is to reboot my laptop before creating each new model with Ollama, but a more reliable solution would be most appreciated.

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.3.11

Originally created by @michaelc2005 on GitHub (Sep 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6883 ### What is the issue? (I have done some searching and as of yet not found any mention of this issue, but I may have missed it.) When creating models from GGUF files downloaded from Hugging Face, I observed that two different models, when tested with an identical prompt (copied and pasted), produced nearly similar responses but not word-for-word. Then, upon switching between models while watching the System Monitor and amdgpu_top, I noticed that the system memory usage remained unchanged, and the new model loaded almost instantly. This swift loading initially sparked excitement until closer scrutiny revealed the underlying issue. I suspected something wrong. Running 'ollama list' showed that both models had identical IDs. A deeper investigation by loading each model and using '/show modelfile' indicated that the same blob file was being used, despite their distinct GGUF files in their respective modelfiles. However, a peculiar observation is that while the model files were nearly identical (besides sloppy formatting), including the 'seed' parameter, the responses differed slightly. The only significant difference between the files was in the 'FROM' parameter. I acknowledge that my investigation may be incomplete or flawed due to a lack of diligence and evidence. Moreover, I apologize if any terminology used is incorrect or confusing. That said, I am merely an AI Hobbyist. My experience dates back to 1979 when I learned of the Eliza 'Therapist' chatbot, and I have been tinkering with AI on and off ever since. Although I possess programming skills in various languages, including Python, my expertise is limited. And, my reluctance to document processes has also hindered me a few times over the years. Sorry. The two GGUF files were: 1. Llama-3.1-Storm-8B.Q8_0.gguf 2. Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf Other important info: Laptop: Lenovo Flex 5-14ARE05 Laptop (AMD Ryzen 5 4500U with integrated Radeon Graphics) OS: Ubuntu 24.04.1 LTS fully updated as of this morning (9/19/2024) Ollama version is 0.3.11 My current workaround is to reboot my laptop before creating each new model with Ollama, but a more reliable solution would be most appreciated. ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.3.11
GiteaMirror added the bug label 2026-04-22 09:35:12 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 20, 2024):

Server logs may aid in debugging. Also the commands you used to create the models, the output of ollama list when the models are loaded, the full content of the Modelfiles, and the paths to the huggingface models.

<!-- gh-comment-id:2362547587 --> @rick-github commented on GitHub (Sep 20, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging. Also the commands you used to create the models, the output of `ollama list` when the models are loaded, the full content of the Modelfiles, and the paths to the huggingface models.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30113