[GH-ISSUE #5777] Mistral Nemo Please! #29359

Closed
opened 2026-04-22 08:08:49 -05:00 by GiteaMirror · 28 comments
Owner

Originally created by @stevengans on GitHub (Jul 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5777

https://mistral.ai/news/mistral-nemo/

Originally created by @stevengans on GitHub (Jul 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5777 https://mistral.ai/news/mistral-nemo/
GiteaMirror added the model label 2026-04-22 08:08:49 -05:00
Author
Owner

@abenmrad commented on GitHub (Jul 18, 2024):

I assume it's only Ollama team members that can publish models from the big players? Or is there a way for community members to contribute in that effort?

<!-- gh-comment-id:2237244082 --> @abenmrad commented on GitHub (Jul 18, 2024): I assume it's only Ollama team members that can publish models from the big players? Or is there a way for community members to contribute in that effort?
Author
Owner

@stevengans commented on GitHub (Jul 18, 2024):

There's licensing I believe?

<!-- gh-comment-id:2237256061 --> @stevengans commented on GitHub (Jul 18, 2024): There's licensing I believe?
Author
Owner

@abenmrad commented on GitHub (Jul 18, 2024):

According to the HF repo, it's Apache 2.0. But they require you to disclose the username and email (to Mistral?) before downloading from HF.

<!-- gh-comment-id:2237283950 --> @abenmrad commented on GitHub (Jul 18, 2024): According to the HF repo, it's Apache 2.0. But they require you to disclose the username and email (to Mistral?) before downloading from HF.
Author
Owner

@SeanKnight commented on GitHub (Jul 18, 2024):

https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct

<!-- gh-comment-id:2237355330 --> @SeanKnight commented on GitHub (Jul 18, 2024): https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct
Author
Owner

@abenmrad commented on GitHub (Jul 18, 2024):

There is indeed an Nvidia repo and Mistral repo (and only the latter requires the "disclosure").

Only the Mistral repo has safetensor files, which I'm now trying to quantize with ollama

<!-- gh-comment-id:2237375745 --> @abenmrad commented on GitHub (Jul 18, 2024): There is indeed an Nvidia repo and Mistral repo (and only the latter requires the "disclosure"). Only the Mistral repo has safetensor files, which I'm now trying to quantize with ollama
Author
Owner

@rick-github commented on GitHub (Jul 18, 2024):

In case you are running up against BPE pre-tokenizer was not recognized errors, there are quantized versions at https://huggingface.co/second-state/Mistral-Nemo-Instruct-2407-GGUF/tree/main

<!-- gh-comment-id:2237480027 --> @rick-github commented on GitHub (Jul 18, 2024): In case you are running up against `BPE pre-tokenizer was not recognized` errors, there are quantized versions at https://huggingface.co/second-state/Mistral-Nemo-Instruct-2407-GGUF/tree/main
Author
Owner

@kralor-guldan commented on GitHub (Jul 18, 2024):

When trying to convert the GGUF linked above, i get the following error:

Error: llama runner process has terminated: signal: aborted (core dumped) error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1

<!-- gh-comment-id:2237556461 --> @kralor-guldan commented on GitHub (Jul 18, 2024): When trying to convert the GGUF linked above, i get the following error: Error: llama runner process has terminated: signal: aborted (core dumped) error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1
Author
Owner

@rick-github commented on GitHub (Jul 18, 2024):

Yeah, I get the same. I guess that we need Nemo support in the llama.cpp GGUF converter.

<!-- gh-comment-id:2237700273 --> @rick-github commented on GitHub (Jul 18, 2024): Yeah, I get the same. I guess that we need Nemo support in the llama.cpp GGUF converter.
Author
Owner

@UmutAlihan commented on GitHub (Jul 18, 2024):

let us patience a bit, it seems this new model definitely be added to Ollama library since its popularity however it has been only a couple of hours I believe, so let us give it a humanely time to developers :)

<!-- gh-comment-id:2237719524 --> @UmutAlihan commented on GitHub (Jul 18, 2024): let us patience a bit, it seems this new model definitely be added to Ollama library since its popularity however it has been only a couple of hours I believe, so let us give it a humanely time to developers :)
Author
Owner

@rick-github commented on GitHub (Jul 18, 2024):

https://github.com/ggerganov/llama.cpp/issues/8577

<!-- gh-comment-id:2237723015 --> @rick-github commented on GitHub (Jul 18, 2024): https://github.com/ggerganov/llama.cpp/issues/8577
Author
Owner

@enryteam commented on GitHub (Jul 20, 2024):

I look forward to meeting it soon. come on, ollama

<!-- gh-comment-id:2241168497 --> @enryteam commented on GitHub (Jul 20, 2024): I look forward to meeting it soon. come on, ollama
Author
Owner

@phalexo commented on GitHub (Jul 20, 2024):

When trying to convert the GGUF linked above, i get the following error:

Error: llama runner process has terminated: signal: aborted (core dumped) error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1

I did NOT get this error converting. Converting went through without any error. The error happened when I tried to run the model, while loading.

<!-- gh-comment-id:2241190716 --> @phalexo commented on GitHub (Jul 20, 2024): > When trying to convert the GGUF linked above, i get the following error: > > Error: llama runner process has terminated: signal: aborted (core dumped) error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1 I did NOT get this error converting. Converting went through without any error. The error happened when I tried to run the model, while loading.
Author
Owner

@muhammadyusuf-kurbonov commented on GitHub (Jul 20, 2024):

https://github.com/ggerganov/llama.cpp/pull/8579 is merged!

<!-- gh-comment-id:2241191765 --> @muhammadyusuf-kurbonov commented on GitHub (Jul 20, 2024): https://github.com/ggerganov/llama.cpp/pull/8579 is merged!
Author
Owner

@rick-github commented on GitHub (Jul 20, 2024):

@phalexo evannorstrand-mp means that when he tries to load the converted GGUF model, llama.cpp dumps core. We are all experiencing that issue and are awaiting support in llama.cpp before we can load the model in ollama.

<!-- gh-comment-id:2241193704 --> @rick-github commented on GitHub (Jul 20, 2024): @phalexo [evannorstrand-mp](https://github.com/evannorstrand-mp) means that when he tries to load the converted GGUF model, llama.cpp dumps core. We are all experiencing that issue and are awaiting support in llama.cpp before we can load the model in ollama.
Author
Owner

@droza123 commented on GitHub (Jul 21, 2024):

i think release b3426 of llama.cpp from yesterday adds support: https://github.com/ggerganov/llama.cpp/releases/tag/b3426

<!-- gh-comment-id:2241751807 --> @droza123 commented on GitHub (Jul 21, 2024): i think release b3426 of llama.cpp from yesterday adds support: https://github.com/ggerganov/llama.cpp/releases/tag/b3426
Author
Owner

@rick-github commented on GitHub (Jul 21, 2024):

b3426 adds the Tekken tokenizer, which Jeffrey merged into ollama in https://github.com/ollama/ollama/pull/5807. However, that doesn't fix the dimensioning issue that causes the core dump when loading the Nemo model. That's being addressed in https://github.com/ggerganov/llama.cpp/pull/8604 which looks close to being merged.

<!-- gh-comment-id:2241754319 --> @rick-github commented on GitHub (Jul 21, 2024): b3426 adds the Tekken tokenizer, which Jeffrey merged into ollama in https://github.com/ollama/ollama/pull/5807. However, that doesn't fix the dimensioning issue that causes the core dump when loading the Nemo model. That's being addressed in https://github.com/ggerganov/llama.cpp/pull/8604 which looks close to being merged.
Author
Owner

@stevengans commented on GitHub (Jul 22, 2024):

an interesting quote on huggingface:

Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3

<!-- gh-comment-id:2241873549 --> @stevengans commented on GitHub (Jul 22, 2024): an interesting quote on huggingface: > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3
Author
Owner

@LordFonDragon commented on GitHub (Jul 22, 2024):

https://github.com/ggerganov/llama.cpp/pull/8604 IT MERGED!!!

<!-- gh-comment-id:2242436753 --> @LordFonDragon commented on GitHub (Jul 22, 2024): https://github.com/ggerganov/llama.cpp/pull/8604 IT MERGED!!!
Author
Owner

@Qualzz commented on GitHub (Jul 22, 2024):

MERGED YOUHOU

<!-- gh-comment-id:2242465167 --> @Qualzz commented on GitHub (Jul 22, 2024): MERGED YOUHOU
Author
Owner

@UmutAlihan commented on GitHub (Jul 22, 2024):

image
<!-- gh-comment-id:2242472071 --> @UmutAlihan commented on GitHub (Jul 22, 2024): <img width="199" alt="image" src="https://github.com/user-attachments/assets/fdbc8b95-ed70-4eae-85e1-dcbc2b8b5f05">
Author
Owner

@danilofalcao commented on GitHub (Jul 22, 2024):

llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'tekken'
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
  File "/root/Mistral-Nemo-Instruct-2407/chat.py", line 21, in <module>
    generated_text = run_inference(model_path, prompt)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/Mistral-Nemo-Instruct-2407/chat.py", line 5, in run_inference
    llm = Llama(model_path=model_path)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/env/lib/python3.11/site-packages/llama_cpp/llama.py", line 358, in __init__
    self._model = self._stack.enter_context(contextlib.closing(_LlamaModel(
                                                               ^^^^^^^^^^^^
  File "/root/env/lib/python3.11/site-packages/llama_cpp/_internals.py", line 54, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: ggml-model-Q4_K_M.gguf

it's able to convert but still having issues with pre-tokenizer tekken using llama-cpp-python

<!-- gh-comment-id:2242498649 --> @danilofalcao commented on GitHub (Jul 22, 2024): ``` llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'tekken' llama_load_model_from_file: failed to load model Traceback (most recent call last): File "/root/Mistral-Nemo-Instruct-2407/chat.py", line 21, in <module> generated_text = run_inference(model_path, prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/Mistral-Nemo-Instruct-2407/chat.py", line 5, in run_inference llm = Llama(model_path=model_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/env/lib/python3.11/site-packages/llama_cpp/llama.py", line 358, in __init__ self._model = self._stack.enter_context(contextlib.closing(_LlamaModel( ^^^^^^^^^^^^ File "/root/env/lib/python3.11/site-packages/llama_cpp/_internals.py", line 54, in __init__ raise ValueError(f"Failed to load model from file: {path_model}") ValueError: Failed to load model from file: ggml-model-Q4_K_M.gguf ``` it's able to convert but still having issues with pre-tokenizer tekken using llama-cpp-python
Author
Owner

@phalexo commented on GitHub (Jul 22, 2024):

I rebuilt Ollama from the source this morning. The wrong dimension error on
loading is still there. Maybe it is because Ollama is using an
older version of llama.cpp.

On Mon, Jul 22, 2024 at 5:09 AM Qualzz @.***> wrote:

MERGED YOUHOU


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/5777#issuecomment-2242465167,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZPAS2S46A2JD3CQKMLZNTD4VAVCNFSM6AAAAABLDD3JGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGQ3DKMJWG4
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2243401550 --> @phalexo commented on GitHub (Jul 22, 2024): I rebuilt Ollama from the source this morning. The wrong dimension error on loading is still there. Maybe it is because Ollama is using an older version of llama.cpp. On Mon, Jul 22, 2024 at 5:09 AM Qualzz ***@***.***> wrote: > MERGED YOUHOU > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/5777#issuecomment-2242465167>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZPAS2S46A2JD3CQKMLZNTD4VAVCNFSM6AAAAABLDD3JGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBSGQ3DKMJWG4> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@rick-github commented on GitHub (Jul 22, 2024):

Yes, it still needs to be integrated into ollama. Either via a patch like https://github.com/ollama/ollama/pull/5807 or a rebase with llama.cpp. In either case it will take a bit of time.

<!-- gh-comment-id:2243406850 --> @rick-github commented on GitHub (Jul 22, 2024): Yes, it still needs to be integrated into ollama. Either via a patch like https://github.com/ollama/ollama/pull/5807 or a rebase with llama.cpp. In either case it will take a bit of time.
Author
Owner

@ProjectMoon commented on GitHub (Jul 22, 2024):

Looks like it's working as of 0.2.8-rc2! I Imported https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF via this Modelfile below. Pretty sure I need a better template though.

FROM ./Mistral-Nemo-Instruct-2407-Q4_K_M.gguf
TEMPLATE """[INST] {{ if .System }}{{ .System }}{{ end }}{{ .Prompt }}[/INST] </s>"""
<!-- gh-comment-id:2243740910 --> @ProjectMoon commented on GitHub (Jul 22, 2024): Looks like it's working as of 0.2.8-rc2! I Imported https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF via this Modelfile below. Pretty sure I need a better template though. ``` FROM ./Mistral-Nemo-Instruct-2407-Q4_K_M.gguf TEMPLATE """[INST] {{ if .System }}{{ .System }}{{ end }}{{ .Prompt }}[/INST] </s>""" ```
Author
Owner

@stevengans commented on GitHub (Jul 22, 2024):

Released!

<!-- gh-comment-id:2243767952 --> @stevengans commented on GitHub (Jul 22, 2024): Released!
Author
Owner

@asanchez75 commented on GitHub (Jul 23, 2024):

Thanks for the release! but I get this error Error: llama runner process has terminated: signal: aborted (core dumped) when I run ollama run mistral-nemo:12b (after I pulled the model).

For your information, I have an NVIDIA GeForce RTX 4090 with 16G of GPU RAM. Am I missing something?

<!-- gh-comment-id:2245063865 --> @asanchez75 commented on GitHub (Jul 23, 2024): Thanks for the release! but I get this error `Error: llama runner process has terminated: signal: aborted (core dumped)` when I run `ollama run mistral-nemo:12b` (after I pulled the model). For your information, I have an NVIDIA GeForce RTX 4090 with 16G of GPU RAM. Am I missing something?
Author
Owner

@asanchez75 commented on GitHub (Jul 23, 2024):

I updated Ollama and it works! Thanks!

<!-- gh-comment-id:2245074591 --> @asanchez75 commented on GitHub (Jul 23, 2024): I updated Ollama and it works! Thanks!
Author
Owner

@phalexo commented on GitHub (Jul 23, 2024):

I have tried Bartowski quant Q8_0 and it works fine, however, he does not
have the 16 bit version which was the one giving me the dimension error
before.

On Mon, Jul 22, 2024, 4:17 PM ProjectMoon @.***> wrote:

Looks like it's working as of 0.2.8-rc2! I Imported
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF via this
Modelfile below. Pretty sure I need a better template though.

FROM ./Mistral-Nemo-Instruct-2407-Q4_K_M.gguf
TEMPLATE """[INST] {{ if .System }}{{ .System }}{{ end }}{{ .Prompt }}[/INST] """


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/5777#issuecomment-2243740910,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABDD3ZOGJXTG2F4PZABSHHDZNVSFFAVCNFSM6AAAAABLDD3JGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTG42DAOJRGA
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2245120278 --> @phalexo commented on GitHub (Jul 23, 2024): I have tried Bartowski quant Q8_0 and it works fine, however, he does not have the 16 bit version which was the one giving me the dimension error before. On Mon, Jul 22, 2024, 4:17 PM ProjectMoon ***@***.***> wrote: > Looks like it's working as of 0.2.8-rc2! I Imported > https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF via this > Modelfile below. Pretty sure I need a better template though. > > FROM ./Mistral-Nemo-Instruct-2407-Q4_K_M.gguf > TEMPLATE """[INST] {{ if .System }}{{ .System }}{{ end }}{{ .Prompt }}[/INST] </s>""" > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/5777#issuecomment-2243740910>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABDD3ZOGJXTG2F4PZABSHHDZNVSFFAVCNFSM6AAAAABLDD3JGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTG42DAOJRGA> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29359