[GH-ISSUE #8598] Error Running Mistral Nemo Imported from .safetensors #67617

Closed
opened 2026-05-04 11:02:43 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @aallgeier on GitHub (Jan 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8598

What is the issue?

I encountered an error when attempting to run the Mistral Nemo model imported from .safetensors. I intend to run the model on CPU only, even though I have a GPU (see the Modelfile below).

  • I am able to run the model converted to .gguf.
  • However, I would like to import and run directly from .safetensors if possible.

Steps to Reproduce

  1. Download model files from mistralai/Mistral-Nemo-Instruct-2407.
  2. Create a Modelfile with the following content:
    FROM <PATH TO .SAFETENSOR FILES>
    PARAMETER num_gpu 0
    
  3. Start the ollama server: ollama serve
  4. Create the model: ollama create nemo -f Modelfile
  5. Run the model: ollama run nemo
    • Error message: Error: llama runner process has terminated: error loading model: error loading model hyperparameters: invalid n_rot: 160, expected 128 llama_load_model_from_file: failed to load model

OS, GPU, CPU

  • OS: Linux fedora 6.12.6
  • GPU: Radeon RX 7600 XT
  • CPU: AMD Ryzen 7 7700X
  • RAM: 64GB

Thank you in advance for the help!

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.5.4

Originally created by @aallgeier on GitHub (Jan 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8598 ### What is the issue? I encountered an error when attempting to run the Mistral Nemo model imported from `.safetensors`. I intend to run the model on CPU only, even though I have a GPU (see the Modelfile below). - I am able to run the model converted to `.gguf`. - However, I would like to import and run directly from `.safetensors` if possible. ### Steps to Reproduce 1. Download model files from [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/tree/main). 2. Create a `Modelfile` with the following content: ``` FROM <PATH TO .SAFETENSOR FILES> PARAMETER num_gpu 0 ``` 3. Start the ollama server: `ollama serve` 4. Create the model: `ollama create nemo -f Modelfile` 5. Run the model: `ollama run nemo` - **Error message**: `Error: llama runner process has terminated: error loading model: error loading model hyperparameters: invalid n_rot: 160, expected 128 llama_load_model_from_file: failed to load model` ### OS, GPU, CPU - OS: Linux fedora 6.12.6 - GPU: Radeon RX 7600 XT - CPU: AMD Ryzen 7 7700X - RAM: 64GB Thank you in advance for the help! ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.5.4
GiteaMirror added the bug label 2026-05-04 11:02:43 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 10, 2025):

You cannot run a model directly from safetensors in ollama. The process of importing the model converts it to a GGUF quantized to FP16. The converter is based on code from llama.cpp but lags the main repo, so only supports a limited number of architectures, and not all variants of those architectures. llama.cpp is used to convert models that ollama doesn't support.

<!-- gh-comment-id:2648260743 --> @rick-github commented on GitHub (Feb 10, 2025): You cannot run a model directly from safetensors in ollama. The process of importing the model converts it to a GGUF quantized to FP16. The converter is based on code from llama.cpp but lags the main repo, so only supports a limited number of architectures, and not all variants of those architectures. llama.cpp is used to convert models that ollama doesn't support.
Author
Owner

@rick-github commented on GitHub (Feb 11, 2025):

You cannot run a model directly from safetensors in ollama.

I can, you already see multiple users doing just that. Just because ollama fails to print a meningful message, people won't try to experiment with this under-documented and perhaps unimplemented functionality.

Importing a safetensor model is clearly documented here. The process of importing a safetensor model converts it into GGUF format with FP16 quantization.

If this is wrong, I would love to see your step by step explanation of how to run a safetensors model directly in ollama.

<!-- gh-comment-id:2652297253 --> @rick-github commented on GitHub (Feb 11, 2025): > > You cannot run a model directly from safetensors in ollama. > > I can, you already see multiple users doing just that. Just because ollama fails to print a meningful message, people won't try to experiment with this under-documented and perhaps unimplemented functionality. Importing a safetensor model is clearly documented [here](https://github.com/ollama/ollama/blob/main/docs/import.md#importing-a-model-from-safetensors-weights). The process of importing a safetensor model converts it into GGUF format with FP16 quantization. If this is wrong, I would love to see your step by step explanation of how to run a safetensors model directly in ollama.
Author
Owner

@rick-github commented on GitHub (Feb 13, 2025):

Mistral Nemo is supported, importing the safetensor version of the model via ollama is not.

Modelfile allows setting extra parameters, like num_gpu in the original post. The create command does have a short form in that it assumes FROM . if no Modelfile is found.

<!-- gh-comment-id:2656018100 --> @rick-github commented on GitHub (Feb 13, 2025): Mistral Nemo is supported, importing the safetensor version of the model via ollama is not. Modelfile allows setting extra parameters, like `num_gpu` in the original post. The `create` command does have a short form in that it assumes `FROM .` if no Modelfile is found.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67617