[GH-ISSUE #6513] magnum-v2.5-12b-kto and magnum-v2-12b not running on ollama #4101

Closed
opened 2026-04-12 15:00:07 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @Tuxaios on GitHub (Aug 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6513

What is the issue?

[root@ubuntu]$ ollama run magnum-v2:12b-fp16
Error: exception error loading model hyperparameters: iinvalid n_rot: 160, expected 128
[root@ubuntu]$ ollama run magnum-v2.5:12b-kto-fp16
Error: exception error loading model hyperparameters: invalid n_rot: 160, expected 128

Can someone explain why this is happening?
It is the first time models do not run on my ollama v0.3.5
The gguf quantized versions of these two models can run properly.
ollama cannot properly load the FP16 versions of these two models.

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.3.5

Originally created by @Tuxaios on GitHub (Aug 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6513 ### What is the issue? [root@ubuntu]$ ollama run magnum-v2:12b-fp16 Error: exception error loading model hyperparameters: iinvalid n_rot: 160, expected 128 [root@ubuntu]$ ollama run magnum-v2.5:12b-kto-fp16 Error: exception error loading model hyperparameters: invalid n_rot: 160, expected 128 Can someone explain why this is happening? It is the first time models do not run on my ollama v0.3.5 The gguf quantized versions of these two models can run properly. ollama cannot properly load the FP16 versions of these two models. ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.3.5
GiteaMirror added the bug label 2026-04-12 15:00:07 -05:00
Author
Owner

@Tuxaios commented on GitHub (Aug 26, 2024):

Here are the addresses for these two models:
https://huggingface.co/anthracite-org/magnum-v2.5-12b-kto
https://huggingface.co/anthracite-org/magnum-v2-12b

<!-- gh-comment-id:2309887327 --> @Tuxaios commented on GitHub (Aug 26, 2024): Here are the addresses for these two models: https://huggingface.co/anthracite-org/magnum-v2.5-12b-kto https://huggingface.co/anthracite-org/magnum-v2-12b
Author
Owner

@Tuxaios commented on GitHub (Aug 26, 2024):

https://huggingface.co/anthracite-org/magnum-v2-12b/discussions/6
The author said "they merged / fixed nemo inferencing." I don't understand what that means.

<!-- gh-comment-id:2309991786 --> @Tuxaios commented on GitHub (Aug 26, 2024): https://huggingface.co/anthracite-org/magnum-v2-12b/discussions/6 The author said "they merged / fixed nemo inferencing." I don't understand what that means.
Author
Owner

@igorschlum commented on GitHub (Aug 26, 2024):

Hi @Tuxaios have you imported the GGUF files as described here:
https://github.com/ollama/ollama

<!-- gh-comment-id:2310219490 --> @igorschlum commented on GitHub (Aug 26, 2024): Hi @Tuxaios have you imported the GGUF files as described here: https://github.com/ollama/ollama
Author
Owner

@Tuxaios commented on GitHub (Aug 26, 2024):

Error: exception error loading model hyperparameters: iinvalid n_rot: 160, expected 128
[root@ubuntu]$ ollama run magnum-v2.5:12b-kto

Hello, sir. I’m glad to see your reply. I’ve already tried the GGUF quantized models provided by the author and they are great. Now, I want to load the FP16 model, but I encountered an error: Error: exception error loading model hyperparameters: invalid n_rot: 160, expected 128. I would like to know if anyone knows how to resolve this issue. Otherwise, I will have to try TabbyAPI.

<!-- gh-comment-id:2310281063 --> @Tuxaios commented on GitHub (Aug 26, 2024): > Error: exception error loading model hyperparameters: iinvalid n_rot: 160, expected 128 > [root@ubuntu]$ ollama run magnum-v2.5:12b-kto Hello, sir. I’m glad to see your reply. I’ve already tried the GGUF quantized models provided by the author and they are great. Now, I want to load the FP16 model, but I encountered an error: Error: exception error loading model hyperparameters: invalid n_rot: 160, expected 128. I would like to know if anyone knows how to resolve this issue. Otherwise, I will have to try TabbyAPI.
Author
Owner

@igorschlum commented on GitHub (Aug 26, 2024):

Hello, The FP16 model requires more RAM (or VRAM) space. My suggestion is to ask the author of this LLM to upload it to the Ollama library. With over 1 million downloads for Llama3.1 just on Ollama.com, the author will see the potential for their LLM and might be encouraged to make any necessary adjustments to the model.

<!-- gh-comment-id:2311078642 --> @igorschlum commented on GitHub (Aug 26, 2024): Hello, The FP16 model requires more RAM (or VRAM) space. My suggestion is to ask the author of this LLM to upload it to the Ollama library. With over 1 million downloads for Llama3.1 just on Ollama.com, the author will see the potential for their LLM and might be encouraged to make any necessary adjustments to the model.
Author
Owner

@rick-github commented on GitHub (Aug 26, 2024):

The safetensor import method in ollama works for a limited set of models. You can import models that ollama can't by using llama.cpp to convert the safetensors to GGUF format.

I have docker installed so the way I convert models is:

docker run --rm -it -v .:/app/models ghcr.io/ggerganov/llama.cpp:full-cuda -c --outtype f16 /app/models

This creates a file (Models-12B-F16.gguf) in the current directory, which I then pass to ollama:

echo FROM Models-12B-F16.gguf > Modelfile
ollama create magnum-v2-12b

ollama detects chat template and fills in parameters:

$ ollama show --modelfile magnum-v2-12b
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM magnum-v2-12b:latest

FROM /root/.ollama/models/blobs/sha256-587d03f008224912b27034e98665dfbb8347f9b9eaa01d2e9968bb0299d5a72e
TEMPLATE "{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>

Be aware that fp16 is a large model and takes a bit of resources to run

$ ollama run magnum-v2-12b
>>> hello
Hello! How can I help you today?

>>> /bye
$ ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL   
magnum-v2-12b:latest    0daea775ee7d    25 GB   36%/64% CPU/GPU Forever
<!-- gh-comment-id:2311087779 --> @rick-github commented on GitHub (Aug 26, 2024): The safetensor import method in ollama works for a limited set of models. You can import models that ollama can't by using [llama.cpp](https://github.com/ggerganov/llama.cpp) to convert the safetensors to GGUF format. I have docker installed so the way I convert models is: ``` docker run --rm -it -v .:/app/models ghcr.io/ggerganov/llama.cpp:full-cuda -c --outtype f16 /app/models ``` This creates a file (Models-12B-F16.gguf) in the current directory, which I then pass to ollama: ``` echo FROM Models-12B-F16.gguf > Modelfile ollama create magnum-v2-12b ``` ollama detects chat template and fills in parameters: ``` $ ollama show --modelfile magnum-v2-12b # Modelfile generated by "ollama show" # To build a new Modelfile based on this, replace FROM with: # FROM magnum-v2-12b:latest FROM /root/.ollama/models/blobs/sha256-587d03f008224912b27034e98665dfbb8347f9b9eaa01d2e9968bb0299d5a72e TEMPLATE "{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant {{ .Response }}<|im_end|> " PARAMETER stop <|im_start|> PARAMETER stop <|im_end|> ``` Be aware that fp16 is a large model and takes a bit of resources to run ``` $ ollama run magnum-v2-12b >>> hello Hello! How can I help you today? >>> /bye $ ollama ps NAME ID SIZE PROCESSOR UNTIL magnum-v2-12b:latest 0daea775ee7d 25 GB 36%/64% CPU/GPU Forever ```
Author
Owner

@Tuxaios commented on GitHub (Aug 27, 2024):

The safetensor import method in ollama works for a limited set of models. You can import models that ollama can't by using llama.cpp to convert the safetensors to GGUF format.

I have docker installed so the way I convert models is:

docker run --rm -it -v .:/app/models ghcr.io/ggerganov/llama.cpp:full-cuda -c --outtype f16 /app/models

This creates a file (Models-12B-F16.gguf) in the current directory, which I then pass to ollama:

echo FROM Models-12B-F16.gguf > Modelfile
ollama create magnum-v2-12b

ollama detects chat template and fills in parameters:

$ ollama show --modelfile magnum-v2-12b
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM magnum-v2-12b:latest

FROM /root/.ollama/models/blobs/sha256-587d03f008224912b27034e98665dfbb8347f9b9eaa01d2e9968bb0299d5a72e
TEMPLATE "{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>

Be aware that fp16 is a large model and takes a bit of resources to run

$ ollama run magnum-v2-12b
>>> hello
Hello! How can I help you today?

>>> /bye
$ ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL   
magnum-v2-12b:latest    0daea775ee7d    25 GB   36%/64% CPU/GPU Forever

Thank you very much for your reply. Does using GGUF affect the model's accuracy? My hardware supports using FP16.

<!-- gh-comment-id:2311363465 --> @Tuxaios commented on GitHub (Aug 27, 2024): > The safetensor import method in ollama works for a limited set of models. You can import models that ollama can't by using [llama.cpp](https://github.com/ggerganov/llama.cpp) to convert the safetensors to GGUF format. > > I have docker installed so the way I convert models is: > > ``` > docker run --rm -it -v .:/app/models ghcr.io/ggerganov/llama.cpp:full-cuda -c --outtype f16 /app/models > ``` > > This creates a file (Models-12B-F16.gguf) in the current directory, which I then pass to ollama: > > ``` > echo FROM Models-12B-F16.gguf > Modelfile > ollama create magnum-v2-12b > ``` > > ollama detects chat template and fills in parameters: > > ``` > $ ollama show --modelfile magnum-v2-12b > # Modelfile generated by "ollama show" > # To build a new Modelfile based on this, replace FROM with: > # FROM magnum-v2-12b:latest > > FROM /root/.ollama/models/blobs/sha256-587d03f008224912b27034e98665dfbb8347f9b9eaa01d2e9968bb0299d5a72e > TEMPLATE "{{ if .System }}<|im_start|>system > {{ .System }}<|im_end|> > {{ end }}{{ if .Prompt }}<|im_start|>user > {{ .Prompt }}<|im_end|> > {{ end }}<|im_start|>assistant > {{ .Response }}<|im_end|> > " > PARAMETER stop <|im_start|> > PARAMETER stop <|im_end|> > ``` > > Be aware that fp16 is a large model and takes a bit of resources to run > > ``` > $ ollama run magnum-v2-12b > >>> hello > Hello! How can I help you today? > > >>> /bye > $ ollama ps > NAME ID SIZE PROCESSOR UNTIL > magnum-v2-12b:latest 0daea775ee7d 25 GB 36%/64% CPU/GPU Forever > ``` Thank you very much for your reply. Does using GGUF affect the model's accuracy? My hardware supports using FP16.
Author
Owner

@rick-github commented on GitHub (Aug 27, 2024):

The GGUF file is FP16.

<!-- gh-comment-id:2311946608 --> @rick-github commented on GitHub (Aug 27, 2024): The GGUF file is FP16.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4101