[GH-ISSUE #13465] unknown tokenizer format: Can't Import Mistral-small-3.2 #8885

Open
opened 2026-04-12 21:41:19 -05:00 by GiteaMirror · 13 comments
Owner

Originally created by @chigkim on GitHub (Dec 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13465

What is the issue?

I tried to import a finetuned version of mistral-small-3.2 in Safetensors weights. I got the error below, so I thought something was wrong with the model. However, I got the same error when I tried with the official version.

hf download mistralai/Mistral-Small-3.2-24B-Instruct-2506 --local-dir ./Mistral-Small-3.2
ollama show mistral-small-3.2 --modelfile>mistral-small-3.2.modelfile
Edit mistral-small-3.2.modelfile and specify FROM ./Mistral-Small-3.2
ollama create mistral-small-3.2:test-q8_0 -f mistral-small-3.2.modelfile -q q8_0

Relevant log output

gathering model components
copying file sha256:6e2501687ccd0e1f30f36319eaf2b46958b897811e246cd8eb5d385b9e3de7d1 100%
copying file sha256:d8da73eed27fb92cfb83927676e3d4e2530f1984f3c53c6ccfc04dab3aa3ab9c 100%
copying file sha256:39257505f84defb725b7b8c6693a9d33a8224074e656fc9588577ad90fae88a4 100%
copying file sha256:01ab910a5dda7995709cc355d094eabb8094b78d49240cd167188606c3ff5edb 100%
copying file sha256:2dd59fc07c02339d2a02831d480111570f863eabe62fe07aa737565fa1b4da7b 100%
copying file sha256:5521de5155d8a28c8cad5560c4da151cc81334b64d54af90af7e9935fce88edf 100%
copying file sha256:81fa1f714e09b07593b375d0509916e36b12ce8c4e389d2d8e99d2d478499e9e 100%
copying file sha256:ac52ac4e326beae33b9dd33f5fca0301fc563e34becfe98ee210ee2b853d93bb 100%
copying file sha256:8bab1a044e04c4f1f80dd7513133ac644f8557a0f1623608ea0c5fec7115401e 100%
copying file sha256:a61ba87beaae27a3ae6684860b49315ed4eeeee42d2ae1799c14421978e7cd17 100%
copying file sha256:660314862ee62dc4a8b63cc52f94a86c584198acbfde9c479b5141df81ef2bbe 100%
copying file sha256:2bf68df1bfbba7195b3e1c09ae62eb7e798bb487819d74e86256c9a50b41c7dc 100%
copying file sha256:925d9c90e80dea268272b3af30cf6127b6ced8e165f8ba20ba09a2009adcddfb 100%
copying file sha256:dbbe9946162f5d2b73e21cc85c6e986409248f8121e108e23e58ec1c4629fc7d 100%
copying file sha256:c350478edef7b63bdcb242fbaeb246679b37e7b0df415c240c0337470197f1ed 100%
converting model
Error: unknown tokenizer format

OS

MacOS 26.2

GPU

Apple

CPU

Apple

Ollama version

0.13.3

Originally created by @chigkim on GitHub (Dec 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13465 ### What is the issue? I tried to import a finetuned version of mistral-small-3.2 in Safetensors weights. I got the error below, so I thought something was wrong with the model. However, I got the same error when I tried with the official version. hf download mistralai/Mistral-Small-3.2-24B-Instruct-2506 --local-dir ./Mistral-Small-3.2 ollama show mistral-small-3.2 --modelfile>mistral-small-3.2.modelfile Edit mistral-small-3.2.modelfile and specify `FROM ./Mistral-Small-3.2` ollama create mistral-small-3.2:test-q8_0 -f mistral-small-3.2.modelfile -q q8_0 ### Relevant log output ```shell gathering model components copying file sha256:6e2501687ccd0e1f30f36319eaf2b46958b897811e246cd8eb5d385b9e3de7d1 100% copying file sha256:d8da73eed27fb92cfb83927676e3d4e2530f1984f3c53c6ccfc04dab3aa3ab9c 100% copying file sha256:39257505f84defb725b7b8c6693a9d33a8224074e656fc9588577ad90fae88a4 100% copying file sha256:01ab910a5dda7995709cc355d094eabb8094b78d49240cd167188606c3ff5edb 100% copying file sha256:2dd59fc07c02339d2a02831d480111570f863eabe62fe07aa737565fa1b4da7b 100% copying file sha256:5521de5155d8a28c8cad5560c4da151cc81334b64d54af90af7e9935fce88edf 100% copying file sha256:81fa1f714e09b07593b375d0509916e36b12ce8c4e389d2d8e99d2d478499e9e 100% copying file sha256:ac52ac4e326beae33b9dd33f5fca0301fc563e34becfe98ee210ee2b853d93bb 100% copying file sha256:8bab1a044e04c4f1f80dd7513133ac644f8557a0f1623608ea0c5fec7115401e 100% copying file sha256:a61ba87beaae27a3ae6684860b49315ed4eeeee42d2ae1799c14421978e7cd17 100% copying file sha256:660314862ee62dc4a8b63cc52f94a86c584198acbfde9c479b5141df81ef2bbe 100% copying file sha256:2bf68df1bfbba7195b3e1c09ae62eb7e798bb487819d74e86256c9a50b41c7dc 100% copying file sha256:925d9c90e80dea268272b3af30cf6127b6ced8e165f8ba20ba09a2009adcddfb 100% copying file sha256:dbbe9946162f5d2b73e21cc85c6e986409248f8121e108e23e58ec1c4629fc7d 100% copying file sha256:c350478edef7b63bdcb242fbaeb246679b37e7b0df415c240c0337470197f1ed 100% converting model Error: unknown tokenizer format ``` ### OS MacOS 26.2 ### GPU Apple ### CPU Apple ### Ollama version 0.13.3
GiteaMirror added the bug label 2026-04-12 21:41:19 -05:00
Author
Owner

@maternion commented on GitHub (Dec 14, 2025):

@chigkim I don't think that's a GGUF? https://ollama.com/library/mistral-small3.2 exists and if you want a fine tuned version then use this https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

<!-- gh-comment-id:3651512870 --> @maternion commented on GitHub (Dec 14, 2025): @chigkim I don't think that's a GGUF? https://ollama.com/library/mistral-small3.2 exists and if you want a fine tuned version then use this https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF
Author
Owner

@chigkim commented on GitHub (Dec 14, 2025):

I need to import a custom finetuned model from safetensor not GGUF

<!-- gh-comment-id:3651521429 --> @chigkim commented on GitHub (Dec 14, 2025): I need to import a custom finetuned model from safetensor not GGUF
Author
Owner

@rick-github commented on GitHub (Dec 17, 2025):

Conversion from safetensors is currently limited to a subset of architectures. Use llama.cpp to convert to GGUF.

<!-- gh-comment-id:3662982135 --> @rick-github commented on GitHub (Dec 17, 2025): Conversion from safetensors is currently limited to a [subset](https://github.com/ollama/ollama/blob/main/docs/import.mdx#importing-a-fine-tuned-adapter-from-safetensors-weights:~:text=on%20several%20different-,model%20architectures,-including%3A) of architectures. Use llama.cpp to convert to GGUF.
Author
Owner

@chigkim commented on GitHub (Dec 17, 2025):

If you convert a multimodal model with llama.cpp, doesn't it separate vision layers and text layers into two gguf files? How do you import both into Ollama then?

<!-- gh-comment-id:3663423143 --> @chigkim commented on GitHub (Dec 17, 2025): If you convert a multimodal model with llama.cpp, doesn't it separate vision layers and text layers into two gguf files? How do you import both into Ollama then?
Author
Owner

@rick-github commented on GitHub (Dec 17, 2025):

Two passes are required to create the GGUF files, one for the text weights and then one for the vision weights.

convert_hf_to_gguf.py --outfile text.gguf --outtype f16 --mistral-format /path/to/safetensors
convert_hf_to_gguf.py --outfile vision.gguf --outtype f16 --mistral-format /path/to/safetensors --mmproj

Then use them in a Modelfile:

echo FROM text.gguf > Modelfile
echo FROM vision.gguf >> Modelfile
ollama show --modelfile mistral-small3.2 | grep -v "^FROM " >> Modelfile
ollama create mistral-small3.2:mymodel
$ ollama run mistral-small3.2:mymodel describe this image ./image1.jpg
Added image './image1.jpg'
The image features a small white puppy sitting on what appears to be a marble or 
stone surface. The puppy has fluffy fur and is wearing a red collar with a bell 
attached to it. The background is blurred, drawing focus to the adorable puppy in 
the foreground. The overall setting seems to be outdoors, possibly near a building 
or structure given the visible architectural elements in the background.
<!-- gh-comment-id:3667616577 --> @rick-github commented on GitHub (Dec 17, 2025): Two passes are required to create the GGUF files, one for the text weights and then one for the vision weights. ``` convert_hf_to_gguf.py --outfile text.gguf --outtype f16 --mistral-format /path/to/safetensors convert_hf_to_gguf.py --outfile vision.gguf --outtype f16 --mistral-format /path/to/safetensors --mmproj ``` Then use them in a Modelfile: ``` echo FROM text.gguf > Modelfile echo FROM vision.gguf >> Modelfile ollama show --modelfile mistral-small3.2 | grep -v "^FROM " >> Modelfile ollama create mistral-small3.2:mymodel ``` ```console $ ollama run mistral-small3.2:mymodel describe this image ./image1.jpg Added image './image1.jpg' The image features a small white puppy sitting on what appears to be a marble or stone surface. The puppy has fluffy fur and is wearing a red collar with a bell attached to it. The background is blurred, drawing focus to the adorable puppy in the foreground. The overall setting seems to be outdoors, possibly near a building or structure given the visible architectural elements in the background. ```
Author
Owner

@chigkim commented on GitHub (Dec 18, 2025):

Thanks so much for the info!
The convert_hf_to_gguf.py says successfully exported to gguf for both vision and text.
However, text.gguf is only 8MB, and vision.gguf is only 2KB for some reason.

<!-- gh-comment-id:3672166267 --> @chigkim commented on GitHub (Dec 18, 2025): Thanks so much for the info! The convert_hf_to_gguf.py says successfully exported to gguf for both vision and text. However, text.gguf is only 8MB, and vision.gguf is only 2KB for some reason.
Author
Owner

@rick-github commented on GitHub (Dec 18, 2025):

When you exported the safetensors, was the adapter fused to the base model? If you exported the adapter only, see here. I've never tried this with a multi-modal model so I don't know how well it will work.

<!-- gh-comment-id:3672756450 --> @rick-github commented on GitHub (Dec 18, 2025): When you exported the safetensors, was the adapter fused to the base model? If you exported the adapter only, see [here](https://github.com/ollama/ollama/blob/main/docs/import.mdx#importing-a-fine-tuned-adapter-from-safetensors-weights). I've never tried this with a multi-modal model so I don't know how well it will work.
Author
Owner

@chigkim commented on GitHub (Dec 19, 2025):

I believe the adapter was fused to the base model. There's no separate adapter.

<!-- gh-comment-id:3675880475 --> @chigkim commented on GitHub (Dec 19, 2025): I believe the adapter was fused to the base model. There's no separate adapter.
Author
Owner

@rick-github commented on GitHub (Jan 2, 2026):

What framework was used for the fine tuning?

<!-- gh-comment-id:3704591755 --> @rick-github commented on GitHub (Jan 2, 2026): What framework was used for the fine tuning?
Author
Owner

@chigkim commented on GitHub (Jan 3, 2026):

Not sure it's one of the abliterated models.
https://huggingface.co/huihui-ai/Huihui-Mistral-Small-3.2-24B-Instruct-2506-abliterated-v2

<!-- gh-comment-id:3706579949 --> @chigkim commented on GitHub (Jan 3, 2026): Not sure it's one of the abliterated models. https://huggingface.co/huihui-ai/Huihui-Mistral-Small-3.2-24B-Instruct-2506-abliterated-v2
Author
Owner

@rick-github commented on GitHub (Jan 3, 2026):

What did you use to finetune the model?

<!-- gh-comment-id:3706612083 --> @rick-github commented on GitHub (Jan 3, 2026): What did you use to finetune the model?
Author
Owner

@chigkim commented on GitHub (Jan 7, 2026):

No idea. I did not finetune it.

<!-- gh-comment-id:3719269655 --> @chigkim commented on GitHub (Jan 7, 2026): No idea. I did not finetune it.
Author
Owner

@rick-github commented on GitHub (Mar 4, 2026):

Finally had a look at the model. The import fails for two reasons, tokenizer.json is missing and generation_config.json is badly formed JSON. Copying tekken.json to tokenizer.json and fixing the JSON allows the model to be imported by ollama, but the model crashes with a panic: failed to sample token error. A comment in the community area says the model is "dumb" and the model card says This is a crude, proof-of-concept implementation to remove refusals so it's probably not worth pursuing this further.

<!-- gh-comment-id:3998690165 --> @rick-github commented on GitHub (Mar 4, 2026): Finally had a look at the model. The import fails for two reasons, `tokenizer.json` is missing and `generation_config.json` is badly formed JSON. Copying `tekken.json` to `tokenizer.json` and fixing the JSON allows the model to be imported by ollama, but the model crashes with a `panic: failed to sample token` error. A comment in the community area says the model is "dumb" and the model card says `This is a crude, proof-of-concept implementation to remove refusals` so it's probably not worth pursuing this further.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8885