[GH-ISSUE #1978] Error "unknown architecture MistralModel" during quantization #63179

Closed
opened 2026-05-03 12:24:24 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @philippgille on GitHub (Jan 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1978

Hello 👋 , First of all thank you very much for creating and maintaining ollama! It's so simple to use 👍

Now I wanted to use ollama for creating embeddings, and saw https://huggingface.co/intfloat/e5-mistral-7b-instruct performing very well on the embeddings benchmark. The official ollama model library doesn't contain it yet, so I wanted to create and upload it myself.

But during the quantization step (docker run --rm -v .:/model:Z ollama/quantize -q q4_0 /model) I get the error:

unknown architecture MistralModel

As Mistral is supported by ollama, I'm wondering about this error. The E5 model is based on the Mistral instruct v0.1 one, so I assume it's the same architecture. Right? Is maybe just the ollama/quantize image not updated with the support yet?

Originally created by @philippgille on GitHub (Jan 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1978 Hello :wave: , First of all thank you very much for creating and maintaining ollama! It's so simple to use :+1: Now I wanted to use ollama for creating embeddings, and saw https://huggingface.co/intfloat/e5-mistral-7b-instruct performing very well on the [embeddings benchmark](https://huggingface.co/spaces/mteb/leaderboard). The official ollama model library doesn't contain it yet, so I wanted to create and upload it myself. But during the quantization step (`docker run --rm -v .:/model:Z ollama/quantize -q q4_0 /model`) I get the error: > unknown architecture MistralModel As Mistral is supported by ollama, I'm wondering about this error. The E5 model is based on the Mistral instruct v0.1 one, so I assume it's the same architecture. Right? Is maybe just the `ollama/quantize` image not updated with the support yet?
Author
Owner

@mxyng commented on GitHub (Jan 16, 2024):

This is expected as the quantize docker image primarily targets inference models. It's untested for non-inference models like embedding models.

Updating the container to support MistralModel doesn't seem to work; I get this error:

$ docker run --rm -it -v $PWD:/mnt ollama/quantize -q q4_0 /mnt/intfloat/e5-mistral-7b-instruct
/workdir/llama.cpp/gguf-py
Loading model file /mnt/intfloat/e5-mistral-7b-instruct/model-00001-of-00002.safetensors
Loading model file /mnt/intfloat/e5-mistral-7b-instruct/model-00001-of-00002.safetensors
Loading model file /mnt/intfloat/e5-mistral-7b-instruct/model-00002-of-00002.safetensors
Traceback (most recent call last):
  File "/workdir/llama.cpp/convert.py", line 1658, in <module>
    main(sys.argv[1:])  # Exclude the first element (script name) from sys.argv
  File "/workdir/llama.cpp/convert.py", line 1577, in main
    model_plus = load_some_model(args.model)
  File "/workdir/llama.cpp/convert.py", line 1354, in load_some_model
    model_plus = merge_multifile_models(models_plus)
  File "/workdir/llama.cpp/convert.py", line 782, in merge_multifile_models
    model = merge_sharded([mp.model for mp in models_plus])
  File "/workdir/llama.cpp/convert.py", line 761, in merge_sharded
    return {name: convert(name) for name in names}
  File "/workdir/llama.cpp/convert.py", line 761, in <dictcomp>
    return {name: convert(name) for name in names}
  File "/workdir/llama.cpp/convert.py", line 736, in convert
    lazy_tensors: list[LazyTensor] = [model[name] for model in models]
  File "/workdir/llama.cpp/convert.py", line 736, in <listcomp>
    lazy_tensors: list[LazyTensor] = [model[name] for model in models]
KeyError: 'embed_tokens.weight'

Unfortunately it looks like llama.cpp's conversion scripts need to be updated before this model can be converted

<!-- gh-comment-id:1894443628 --> @mxyng commented on GitHub (Jan 16, 2024): This is expected as the quantize docker image primarily targets inference models. It's untested for non-inference models like embedding models. Updating the container to support MistralModel doesn't seem to work; I get this error: ``` $ docker run --rm -it -v $PWD:/mnt ollama/quantize -q q4_0 /mnt/intfloat/e5-mistral-7b-instruct /workdir/llama.cpp/gguf-py Loading model file /mnt/intfloat/e5-mistral-7b-instruct/model-00001-of-00002.safetensors Loading model file /mnt/intfloat/e5-mistral-7b-instruct/model-00001-of-00002.safetensors Loading model file /mnt/intfloat/e5-mistral-7b-instruct/model-00002-of-00002.safetensors Traceback (most recent call last): File "/workdir/llama.cpp/convert.py", line 1658, in <module> main(sys.argv[1:]) # Exclude the first element (script name) from sys.argv File "/workdir/llama.cpp/convert.py", line 1577, in main model_plus = load_some_model(args.model) File "/workdir/llama.cpp/convert.py", line 1354, in load_some_model model_plus = merge_multifile_models(models_plus) File "/workdir/llama.cpp/convert.py", line 782, in merge_multifile_models model = merge_sharded([mp.model for mp in models_plus]) File "/workdir/llama.cpp/convert.py", line 761, in merge_sharded return {name: convert(name) for name in names} File "/workdir/llama.cpp/convert.py", line 761, in <dictcomp> return {name: convert(name) for name in names} File "/workdir/llama.cpp/convert.py", line 736, in convert lazy_tensors: list[LazyTensor] = [model[name] for model in models] File "/workdir/llama.cpp/convert.py", line 736, in <listcomp> lazy_tensors: list[LazyTensor] = [model[name] for model in models] KeyError: 'embed_tokens.weight' ``` Unfortunately it looks like llama.cpp's conversion scripts need to be updated before this model can be converted
Author
Owner

@jmorganca commented on GitHub (May 7, 2024):

This should be fixed now by following https://github.com/ollama/ollama/blob/main/docs/import.md

Also! Excitingly quantization and conversion are being added to Ollama directly:

git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

Create a modelfile:

FROM ./Mistral-7B-Instruct-v0.2

# basic prompt template
TEMPLATE """[INST] {{ .Prompt }} [/INST]"""

PARAMETER stop [INST]
PARAMETER stop [/INST]

Then run ollama create:

ollama create -q q4_0 -f Modelfile my-mistral
ollama run my-mistral
<!-- gh-comment-id:2097126601 --> @jmorganca commented on GitHub (May 7, 2024): This should be fixed now by following https://github.com/ollama/ollama/blob/main/docs/import.md Also! Excitingly quantization and conversion are being added to Ollama directly: ``` git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 ``` Create a modelfile: ``` FROM ./Mistral-7B-Instruct-v0.2 # basic prompt template TEMPLATE """[INST] {{ .Prompt }} [/INST]""" PARAMETER stop [INST] PARAMETER stop [/INST] ``` Then run `ollama create`: ``` ollama create -q q4_0 -f Modelfile my-mistral ollama run my-mistral ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63179