[GH-ISSUE #11560] [Model Request] Support new SmallThinker series models #54142

Open
opened 2026-04-29 05:16:30 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @wdl339 on GitHub (Jul 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11560

Ollama previously added support for PowerInfer/SmallThinker-3B-Preview, which has 68.8K downloads. The PowerInfer team has recently released new and improved models in the SmallThinker series.

SmallThinker is a family of on-device native Mixture-of-Experts (MoE) language models specially designed for local deployment, co-developed by the IPADS and School of AI at Shanghai Jiao Tong University and Zenergize AI. Designed from the ground up for resource-constrained environments, SmallThinker brings powerful, private, and low-latency AI directly to your personal devices, without relying on the cloud.

The new models consist of two sizes:

  1. PowerInfer/SmallThinker-21BA3B-Instruct-GGUF : a 21B parameter MoE model with only 3B active parameters
  2. PowerInfer/SmallThinker-4BA0.6B-Instruct-GGUF : a 4B parameter MoE model with just 0.6B active parameters

We would appreciate it if the Ollama team could consider supporting them. Thank you for your time and consideration!

Originally created by @wdl339 on GitHub (Jul 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11560 Ollama previously added support for [PowerInfer/SmallThinker-3B-Preview](https://huggingface.co/PowerInfer/SmallThinker-3B-Preview), which has 68.8K downloads. The PowerInfer team has recently released new and improved models in the SmallThinker series. > SmallThinker is a family of on-device native Mixture-of-Experts (MoE) language models specially designed for local deployment, co-developed by the IPADS and School of AI at Shanghai Jiao Tong University and Zenergize AI. Designed from the ground up for resource-constrained environments, SmallThinker brings powerful, private, and low-latency AI directly to your personal devices, without relying on the cloud. The new models consist of two sizes: 1. [PowerInfer/SmallThinker-21BA3B-Instruct-GGUF](https://www.google.com/url?sa=E&q=https%3A%2F%2Fhuggingface.co%2FPowerInfer%2FSmallThinker-21BA3B-Instruct-GGUF) : a 21B parameter MoE model with only 3B active parameters 2. [PowerInfer/SmallThinker-4BA0.6B-Instruct-GGUF](https://www.google.com/url?sa=E&q=https%3A%2F%2Fhuggingface.co%2FPowerInfer%2FSmallThinker-4BA0.6B-Instruct-GGUF) : a 4B parameter MoE model with just 0.6B active parameters We would appreciate it if the Ollama team could consider supporting them. Thank you for your time and consideration!
GiteaMirror added the model label 2026-04-29 05:16:30 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 28, 2025):

https://github.com/ggml-org/llama.cpp/pull/14898

<!-- gh-comment-id:3130090722 --> @rick-github commented on GitHub (Jul 28, 2025): https://github.com/ggml-org/llama.cpp/pull/14898
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54142