[GH-ISSUE #14266] Model request: Nanbeige4.1-3B #35049

Open
opened 2026-04-22 19:13:42 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @section1 on GitHub (Feb 15, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14266

Looks good for the size.. and run it with low hardware resources.

https://huggingface.co/Nanbeige/Nanbeige4.1-3B

Thanks

Originally created by @section1 on GitHub (Feb 15, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14266 Looks good for the size.. and run it with low hardware resources. https://huggingface.co/Nanbeige/Nanbeige4.1-3B Thanks
GiteaMirror added the model label 2026-04-22 19:13:42 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 15, 2026):

This is llama architecture and so should be importable.

<!-- gh-comment-id:3904219577 --> @rick-github commented on GitHub (Feb 15, 2026): This is llama architecture and so should be [importable](https://github.com/ollama/ollama/blob/main/docs/import.mdx#Importing-a-model-from-Safetensors-weights).
Author
Owner

@SurealCereal commented on GitHub (Feb 15, 2026):

I tried Q6_K and Q8_0 quants on an RTX 3080 12GB with v0.16.2-rc0 and it was very slow compared to larger dense models, partly because the "thinking" traces were extremely long and strange. It argues with itself a lot for a basic prompts like "Hello. Please tell me about yourself.". I tried the recommended parameters:

temperature 0.6
top_p 0.95
repeat_penalty 1.0

I also tried top_k 0, 40 and default.

<!-- gh-comment-id:3904298527 --> @SurealCereal commented on GitHub (Feb 15, 2026): I tried [Q6_K](https://huggingface.co/mradermacher/Nanbeige4.1-3B-GGUF?show_file_info=Nanbeige4.1-3B.Q6_K.gguf) and [Q8_0](https://huggingface.co/DevQuasar/Nanbeige.Nanbeige4.1-3B-GGUF?show_file_info=Nanbeige.Nanbeige4.1-3B.Q8_0.gguf) quants on an RTX 3080 12GB with `v0.16.2-rc0` and it was very slow compared to larger dense models, partly because the "thinking" traces were extremely long and strange. It argues with itself a lot for a basic prompts like "Hello. Please tell me about yourself.". I tried the recommended parameters: ``` temperature 0.6 top_p 0.95 repeat_penalty 1.0 ``` I also tried `top_k 0, 40 and default`.
Author
Owner

@rick-github commented on GitHub (Feb 15, 2026):

It's only a 3b model, I would take the claims given with a pretty big piece of salt. The template in the mradermacher quant doesn't fully match the one from the original model so it's possible the response could be improved, but again, only 3b.

<!-- gh-comment-id:3904321075 --> @rick-github commented on GitHub (Feb 15, 2026): It's only a 3b model, I would take the claims given with a pretty big piece of salt. The template in the mradermacher quant doesn't fully match the one from the original model so it's possible the response could be improved, but again, only 3b.
Author
Owner

@section1 commented on GitHub (Feb 15, 2026):

Yeah is what i read...that is slow or very verbose when thinking.. this will improved in version 4.2 and better agentic "skills".. there's a post in reddit about this.i haven't tested for now.. but thanks for the links guys i will try to import it first and try it

<!-- gh-comment-id:3904374964 --> @section1 commented on GitHub (Feb 15, 2026): Yeah is what i read...that is slow or very verbose when thinking.. this will improved in version 4.2 and better agentic "skills".. there's a post in reddit about this.i haven't tested for now.. but thanks for the links guys i will try to import it first and try it
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35049