[GH-ISSUE #6982] Mistral-NeMo-Minitron-8B-Base/Chat #4421

Closed
opened 2026-04-12 15:21:33 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @Axenide on GitHub (Sep 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6982

Mistral-NeMo-12B has great capabilities, but it doesn't fit in my GPU so I have to offload part of it to the CPU and RAM, which makes it really slow. 8B models work great though, so I think it would be a great adition to have this model.

Here is the base model:
https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base

And here is a fine-tuned chat version:
https://huggingface.co/rasyosef/Mistral-NeMo-Minitron-8B-Chat

Here is the GUFF version of the fine-tune mentioned:
https://huggingface.co/mradermacher/Mistral-NeMo-Minitron-8B-Chat-GGUF

Originally created by @Axenide on GitHub (Sep 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6982 Mistral-NeMo-12B has great capabilities, but it doesn't fit in my GPU so I have to offload part of it to the CPU and RAM, which makes it really slow. 8B models work great though, so I think it would be a great adition to have this model. Here is the base model: https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base And here is a fine-tuned chat version: https://huggingface.co/rasyosef/Mistral-NeMo-Minitron-8B-Chat Here is the GUFF version of the fine-tune mentioned: https://huggingface.co/mradermacher/Mistral-NeMo-Minitron-8B-Chat-GGUF
GiteaMirror added the model label 2026-04-12 15:21:33 -05:00
Author
Owner

@semidark commented on GitHub (Oct 10, 2024):

I am currently running Mistral-NeMo-Minitron-8B on llama.cpp. So i think getting it to run on ollama should be quite straight forward. Maybe you could follow the import guide and get it running?

<!-- gh-comment-id:2404807287 --> @semidark commented on GitHub (Oct 10, 2024): I am currently running `Mistral-NeMo-Minitron-8B` on [llama.cpp](https://github.com/ggerganov/llama.cpp). So i think getting it to run on ollama should be quite straight forward. Maybe you could follow the [import guide](https://github.com/ollama/ollama/blob/main/docs/import.md#Importing-a-GGUF-based-model-or-adapter) and get it running?
Author
Owner

@Axenide commented on GitHub (Oct 10, 2024):

@semidark Hi, I tried importing the models to Ollama by myself, but both Chat and Base act as completion. Maybe that's a problem with the models I used. Could you please share where you got yours?

<!-- gh-comment-id:2406174899 --> @Axenide commented on GitHub (Oct 10, 2024): @semidark Hi, I tried importing the models to Ollama by myself, but both `Chat` and `Base` act as completion. Maybe that's a problem with the models I used. Could you please share where you got yours?
Author
Owner

@semidark commented on GitHub (Oct 11, 2024):

I did not document which model I downloaded. Sorry... I always thought about starting a list of models I downloaded, but the hope that i could switch over to ollama instead of using llama.cpp directly kept me from doing so.

I was using it with "open WebUI" and the openAI compatible API of llama.cpp. I had to configure the chat Template since open WebUI just kept chatting with itself. Configuring the Template fixed this for me. Maybe this is also needed for your setup?

<!-- gh-comment-id:2407111846 --> @semidark commented on GitHub (Oct 11, 2024): I did not document which model I downloaded. Sorry... I always thought about starting a list of models I downloaded, but the hope that i could switch over to ollama instead of using llama.cpp directly kept me from doing so. I was using it with "open WebUI" and the openAI compatible API of llama.cpp. I had to configure the chat Template since open WebUI just kept chatting with itself. Configuring the Template fixed this for me. Maybe this is also needed for your setup?
Author
Owner

@Axenide commented on GitHub (Oct 11, 2024):

@semidark It's okay, thanks. Yeah, I configured the template correctly.
I might just search for another fine-tuning heh.

<!-- gh-comment-id:2407925314 --> @Axenide commented on GitHub (Oct 11, 2024): @semidark It's okay, thanks. Yeah, I configured the template correctly. I might just search for another fine-tuning heh.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4421