[GH-ISSUE #14245] MiniMax M2.5 (local model request) #35035

Open
opened 2026-04-22 19:09:11 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @asitwere on GitHub (Feb 14, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14245

HF: https://huggingface.co/MiniMaxAI/MiniMax-M2.5

GH: https://github.com/MiniMax-AI/MiniMax-M2.5

230B A10B

Originally created by @asitwere on GitHub (Feb 14, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14245 HF: https://huggingface.co/MiniMaxAI/MiniMax-M2.5 GH: https://github.com/MiniMax-AI/MiniMax-M2.5 230B A10B
GiteaMirror added the model label 2026-04-22 19:09:11 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 14, 2026):

https://ollama.com/frob/minimax-m2.5

<!-- gh-comment-id:3900536280 --> @rick-github commented on GitHub (Feb 14, 2026): https://ollama.com/frob/minimax-m2.5
Author
Owner

@rjmalagon commented on GitHub (Feb 14, 2026):

Thanks, @rick-github. This is really welcome.
There is this, maybe ill-thought, feeling that the core Ollama team is distancing itself from the easy local inference server tools to the cloud inference selling market.

It is not ignored that massive models hosting costs a lot in bandwidth and storage. Just why does the public MiniMax M2.5 Ollama "repo" offer only the cloud tag? It is not ignored that you helped us with this a lot already too.

Maybe it is a good idea to promote the idea of Ollama binary importing (and converting) the supported models directly from HuggingFace servers (and the like), if this can relieve the massive model hosting costs (putting aside the massive local resources needed for users to do that).

<!-- gh-comment-id:3902302646 --> @rjmalagon commented on GitHub (Feb 14, 2026): Thanks, @rick-github. This is really welcome. There is this, maybe ill-thought, feeling that the core Ollama team is distancing itself from the easy local inference server tools to the cloud inference selling market. It is not ignored that massive models hosting costs a lot in bandwidth and storage. Just why does the public MiniMax M2.5 Ollama "repo" offer only the cloud tag? It is not ignored that you helped us with this a lot already too. Maybe it is a good idea to promote the idea of Ollama binary importing (and converting) the supported models directly from HuggingFace servers (and the like), if this can relieve the massive model hosting costs (putting aside the massive local resources needed for users to do that).
Author
Owner

@rick-github commented on GitHub (Feb 14, 2026):

There is this, maybe ill-thought, feeling that the core Ollama team is distancing itself from the easy local inference server tools to the cloud inference selling market.

These concerns have been raised before, for example with qwen3-vl. The local version of the model was eventually made available, but it did lag the release of the cloud version. I'm not a member of the core Ollama team so it's not my place to speculate on model selection, I will just say that in my experience building a quantized local model requires some effort that perhaps the core team can't spare at the moment.

Importing from HuggingFace or other model repos is an option. There are many model request tickets that are resolved by a comment showing how to import from HF. There are some drawbacks to this, though.

  • The template provided with the HF pull doesn't always make full use of model capabilities within ollama. There are many model request tickets that are resolved by showing how to pull the base model and then add on the ollama template required for exposing thinking, tool use, etc.
  • An issue with pulling large models from HF is that they are sharded, ie broken into smaller pieces. Again tickets show how to assemble these shards into a full model, but this is beginning to move into "too hard" or "too expensive" territory for some hobby AI practitioners.
  • Some models that are not yet in GGUF format and will not load into ollama. Minimax-m2.5 was such a model when it was pushed to the user library.
  • Some models don't yet have the support in ollama to be run locally.

Users are encouraged to share models. If a user feels there is a particular model not in the main library they would like to run but do not have the ability or resources to build it themselves, posting a request here for community support is always an option.

<!-- gh-comment-id:3902398888 --> @rick-github commented on GitHub (Feb 14, 2026): > There is this, maybe ill-thought, feeling that the core Ollama team is distancing itself from the easy local inference server tools to the cloud inference selling market. These concerns have been raised before, for example with qwen3-vl. The local version of the model was eventually made available, but it did lag the release of the cloud version. I'm not a member of the core Ollama team so it's not my place to speculate on model selection, I will just say that in my experience building a quantized local model requires some effort that perhaps the core team can't spare at the moment. [Importing from HuggingFace](https://huggingface.co/docs/hub/ollama) or other model repos is an option. There are many model request tickets that are resolved by a comment showing how to import from HF. There are some drawbacks to this, though. * The template provided with the HF pull doesn't always make full use of model capabilities within ollama. There are many model request tickets that are resolved by showing how to pull the base model and then add on the ollama template required for exposing thinking, tool use, etc. * An issue with pulling large models from HF is that they are sharded, ie broken into smaller pieces. Again tickets show how to assemble these shards into a full model, but this is beginning to move into "too hard" or "too expensive" territory for some hobby AI practitioners. * Some models that are not yet in GGUF format and will not load into ollama. Minimax-m2.5 was such a model when it was pushed to the user library. * Some models don't yet have the support in ollama to be run locally. Users are [encouraged](https://github.com/ollama/ollama/blob/main/docs/import.mdx#sharing-your-model-on-ollamacom) to share models. If a user feels there is a particular model not in the main library they would like to run but do not have the ability or resources to build it themselves, posting a request here for community support is always an option.
Author
Owner

@asitwere commented on GitHub (Feb 14, 2026):

Thanks @rick-github -- yes, the HF pulls are appreciated in a pinch but the supported models have always been the most reliable options overall, particularly in cases like GPT OSS 120B & Qwen 235B which are both excellent via Ollama.

Meanwhile I don't get the pitch for Ollama Cloud over options like OpenRouter where I can easily configure ZDR-only models. Ollama is suspiciously unclear about this; the docs say "Ollama does not log your data" but is Ollama the only provider involved here? Happy to hear that Ollama isn't logging my prompts to Gemini Flash, but is Google?

Hoping to see continued optimization and support of major releases, especially large & competitive models. More and more of us are working on relatively high-compute consumer hardware. A model like MiniMax is right in the perfect parameter range for M3 Ultra which can run 230B at full context in 8-bit.

Appreciate your support.

<!-- gh-comment-id:3902453696 --> @asitwere commented on GitHub (Feb 14, 2026): Thanks @rick-github -- yes, the HF pulls are appreciated in a pinch but the supported models have always been the most reliable options overall, particularly in cases like GPT OSS 120B & Qwen 235B which are both excellent via Ollama. Meanwhile I don't get the pitch for Ollama Cloud over options like OpenRouter where I can easily configure ZDR-only models. Ollama is suspiciously unclear about this; the docs say "Ollama does not log your data" but is Ollama the only provider involved here? Happy to hear that Ollama isn't logging my prompts to Gemini Flash, but is Google? Hoping to see continued optimization and support of major releases, especially large & competitive models. More and more of us are working on relatively high-compute consumer hardware. A model like MiniMax is right in the perfect parameter range for M3 Ultra which can run 230B at full context in 8-bit. Appreciate your support.
Author
Owner

@rjmalagon commented on GitHub (Feb 16, 2026):

There is this, maybe ill-thought, feeling that the core Ollama team is distancing itself from the easy local inference server tools to the cloud inference selling market.

These concerns have been raised before, for example with qwen3-vl. The local version of the model was eventually made available, but it did lag the release of the cloud version. I'm not a member of the core Ollama team so it's not my place to speculate on model selection, I will just say that in my experience building a quantized local model requires some effort that perhaps the core team can't spare at the moment.

Importing from HuggingFace or other model repos is an option. There are many model request tickets that are resolved by a comment showing how to import from HF. There are some drawbacks to this, though.

* The template provided with the HF pull doesn't always make full use of model capabilities within ollama.  There are many model request tickets that are resolved by showing how to pull the base model and then add on the ollama template required for exposing thinking, tool use, etc.

* An issue with pulling large models from HF is that they are sharded, ie broken into smaller pieces.  Again tickets show how to assemble these shards into a full model, but this is beginning to move into "too hard" or "too expensive" territory for some hobby AI practitioners.

* Some models that are not yet in GGUF format and will not load into ollama.  Minimax-m2.5 was such a model when it was pushed to the user library.

* Some models don't yet have the support in ollama to be run locally.

Users are encouraged to share models. If a user feels there is a particular model not in the main library they would like to run but do not have the ability or resources to build it themselves, posting a request here for community support is always an option.

To the HuggingFace comment, I refer to importing (and converting) directly from the tensor repositories, sidestepping GGUF files. Give an HF/Repo access token, and put the tensor repo URL directly in the Modelfile, let Ollama download, convert/quantize, import.

I acknowledge converting and importing is time-consuming; I have been there https://ollama.com/rjmalagon (llama.cpp GGUF convert tools, Ollama import and convert tools).

<!-- gh-comment-id:3909150189 --> @rjmalagon commented on GitHub (Feb 16, 2026): > > There is this, maybe ill-thought, feeling that the core Ollama team is distancing itself from the easy local inference server tools to the cloud inference selling market. > > These concerns have been raised before, for example with qwen3-vl. The local version of the model was eventually made available, but it did lag the release of the cloud version. I'm not a member of the core Ollama team so it's not my place to speculate on model selection, I will just say that in my experience building a quantized local model requires some effort that perhaps the core team can't spare at the moment. > > [Importing from HuggingFace](https://huggingface.co/docs/hub/ollama) or other model repos is an option. There are many model request tickets that are resolved by a comment showing how to import from HF. There are some drawbacks to this, though. > > * The template provided with the HF pull doesn't always make full use of model capabilities within ollama. There are many model request tickets that are resolved by showing how to pull the base model and then add on the ollama template required for exposing thinking, tool use, etc. > > * An issue with pulling large models from HF is that they are sharded, ie broken into smaller pieces. Again tickets show how to assemble these shards into a full model, but this is beginning to move into "too hard" or "too expensive" territory for some hobby AI practitioners. > > * Some models that are not yet in GGUF format and will not load into ollama. Minimax-m2.5 was such a model when it was pushed to the user library. > > * Some models don't yet have the support in ollama to be run locally. > > > Users are [encouraged](https://github.com/ollama/ollama/blob/main/docs/import.mdx#sharing-your-model-on-ollamacom) to share models. If a user feels there is a particular model not in the main library they would like to run but do not have the ability or resources to build it themselves, posting a request here for community support is always an option. To the HuggingFace comment, I refer to importing (and converting) directly from the tensor repositories, sidestepping GGUF files. Give an HF/Repo access token, and put the tensor repo URL directly in the Modelfile, let Ollama download, convert/quantize, import. I acknowledge converting and importing is time-consuming; I have been there https://ollama.com/rjmalagon (llama.cpp GGUF convert tools, Ollama import and convert tools).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35035