[GH-ISSUE #1682] Importing (PyTorch & Safetensors) #26707

Closed
opened 2026-04-22 03:09:31 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @ForkedInTime on GitHub (Dec 23, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1682

Step 1 ok from section: "Importing (PyTorch & Safetensors)"
Step 2 fails with docker command:

"yetipaw@dolphin  ~  cd Apps/dolphin-2.5-mixtral-8x7b
yetipaw@dolphin  ~/Apps/dolphin-2.5-mixtral-8x7b   main  docker run --rm -v .:/model ollama/quantize -q q4_0 /model
unknown architecture MixtralForCausalLM_
yetipaw@dolphin  ~/Apps/dolphin-2.5-mixtral-8x7b   main 
"
Architecture is defined ok in config.json:

"yetipaw@dolphin  ~/Apps/dolphin-2.5-mixtral-8x7b   main  ls
added_tokens.json pytorch_model-00004-of-00019.bin pytorch_model-00012-of-00019.bin pytorch_model.bin.index.json
config.json pytorch_model-00005-of-00019.bin pytorch_model-00013-of-00019.bin README.md
configs pytorch_model-00006-of-00019.bin pytorch_model-00014-of-00019.bin special_tokens_map.json
generation_config.json pytorch_model-00007-of-00019.bin pytorch_model-00015-of-00019.bin tokenizer_config.json
Modelfile pytorch_model-00008-of-00019.bin pytorch_model-00016-of-00019.bin tokenizer.model
pytorch_model-00001-of-00019.bin pytorch_model-00009-of-00019.bin pytorch_model-00017-of-00019.bin
pytorch_model-00002-of-00019.bin pytorch_model-00010-of-00019.bin pytorch_model-00018-of-00019.bin
pytorch_model-00003-of-00019.bin pytorch_model-00011-of-00019.bin pytorch_model-00019-of-00019.bin
"

yetipaw@dolphin  ~/Apps/dolphin-2.5-mixtral-8x7b   main  cat config.json
{
"_name_or_path": "/workspace/models/Mixtral-8x7B-v0.1",
"architectures": [
"MixtralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 32000,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mixtral",
"num_attention_heads": 32,
"num_experts_per_tok": 2,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"num_local_experts": 8,
"output_router_logits": false,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"router_aux_loss_coef": 0.02,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.36.0.dev0",
"use_cache": false,
"vocab_size": 32002
}

Please advise.

Originally created by @ForkedInTime on GitHub (Dec 23, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1682 Step 1 ok from section: "Importing (PyTorch & Safetensors)" Step 2 fails with docker command: "yetipaw@dolphin  ~  cd Apps/dolphin-2.5-mixtral-8x7b yetipaw@dolphin  ~/Apps/dolphin-2.5-mixtral-8x7b   main  docker run --rm -v .:/model ollama/quantize -q q4_0 /model **__unknown_ architecture MixtralForCausalLM__** yetipaw@dolphin  ~/Apps/dolphin-2.5-mixtral-8x7b   main  " Architecture is defined ok in config.json: "yetipaw@dolphin  ~/Apps/dolphin-2.5-mixtral-8x7b   main  ls added_tokens.json pytorch_model-00004-of-00019.bin pytorch_model-00012-of-00019.bin pytorch_model.bin.index.json config.json pytorch_model-00005-of-00019.bin pytorch_model-00013-of-00019.bin README.md configs pytorch_model-00006-of-00019.bin pytorch_model-00014-of-00019.bin special_tokens_map.json generation_config.json pytorch_model-00007-of-00019.bin pytorch_model-00015-of-00019.bin tokenizer_config.json Modelfile pytorch_model-00008-of-00019.bin pytorch_model-00016-of-00019.bin tokenizer.model pytorch_model-00001-of-00019.bin pytorch_model-00009-of-00019.bin pytorch_model-00017-of-00019.bin pytorch_model-00002-of-00019.bin pytorch_model-00010-of-00019.bin pytorch_model-00018-of-00019.bin pytorch_model-00003-of-00019.bin pytorch_model-00011-of-00019.bin pytorch_model-00019-of-00019.bin " yetipaw@dolphin  ~/Apps/dolphin-2.5-mixtral-8x7b   main  cat config.json { "_name_or_path": "/workspace/models/Mixtral-8x7B-v0.1", "architectures": [ **"MixtralForCausalLM"** ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 32000, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mixtral", "num_attention_heads": 32, "num_experts_per_tok": 2, "num_hidden_layers": 32, "num_key_value_heads": 8, "num_local_experts": 8, "output_router_logits": false, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "router_aux_loss_coef": 0.02, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.36.0.dev0", "use_cache": false, "vocab_size": 32002 } Please advise.
GiteaMirror added the bug label 2026-04-22 03:09:31 -05:00
Author
Owner

@easp commented on GitHub (Dec 31, 2023):

Same problem when trying to quantize this mixtral derivative: https://huggingface.co/mlabonne/Beyonder-4x7b

<!-- gh-comment-id:1872985109 --> @easp commented on GitHub (Dec 31, 2023): Same problem when trying to quantize this mixtral derivative: https://huggingface.co/mlabonne/Beyonder-4x7b
Author
Owner

@Jas0nxlee commented on GitHub (Jan 7, 2024):

Same problem when trying https://huggingface.co/BAAI/bge-large-zh

<!-- gh-comment-id:1879949538 --> @Jas0nxlee commented on GitHub (Jan 7, 2024): Same problem when trying https://huggingface.co/BAAI/bge-large-zh
Author
Owner

@Settordici commented on GitHub (Jan 7, 2024):

The architecture "MixtralForCausalLM" is not supported yet.
You can see the supported architectures here

<!-- gh-comment-id:1880137571 --> @Settordici commented on GitHub (Jan 7, 2024): The architecture "MixtralForCausalLM" is not supported yet. You can see the supported architectures [here](https://hub.docker.com/r/ollama/quantize)
Author
Owner

@easp commented on GitHub (Jan 8, 2024):

@Settordici which then raises the question of how the mixtral, dolphin-mixtral and notux models in ollama.ai/library were converted and quantized. The original models are all MixtralForCausalLM.

@Jas0nxlee bge-large-zh's architecture, which you can see in config.json in its repo, is "BertModel." That's not supported

<!-- gh-comment-id:1880287483 --> @easp commented on GitHub (Jan 8, 2024): @Settordici which then raises the question of how the mixtral, dolphin-mixtral and notux models in ollama.ai/library were converted and quantized. The original models are all MixtralForCausalLM. @Jas0nxlee bge-large-zh's architecture, which you can see in config.json in its repo, is "BertModel." That's [not supported](https://github.com/jmorganca/ollama/issues/327)
Author
Owner

@Settordici commented on GitHub (Jan 8, 2024):

@Settordici which then raises the question of how the mixtral, dolphin-mixtral and notux models in ollama.ai/library were converted and quantized. The original models are all MixtralForCausalLM.

@Jas0nxlee bge-large-zh's architecture, which you can see in config.json in its repo, is "BertModel." That's not supported

maybe they used this script from the llama.cpp repository? (https://github.com/ggerganov/llama.cpp/discussions/2948)
or they already exported the model into a supported format like gguf and so they didn't need to convert it?

<!-- gh-comment-id:1881031938 --> @Settordici commented on GitHub (Jan 8, 2024): > @Settordici which then raises the question of how the mixtral, dolphin-mixtral and notux models in ollama.ai/library were converted and quantized. The original models are all MixtralForCausalLM. > > @Jas0nxlee bge-large-zh's architecture, which you can see in config.json in its repo, is "BertModel." That's [not supported](https://github.com/jmorganca/ollama/issues/327) maybe they used this script from the llama.cpp repository? (https://github.com/ggerganov/llama.cpp/discussions/2948) or they already exported the model into a supported format like gguf and so they didn't need to convert it?
Author
Owner

@pdevine commented on GitHub (Mar 12, 2024):

We actually changed the docs on this a while back to not use the docker image for quantizing. You can see it here.

I have been working on a new way to convert directly from safetensors directly into Ollama from a Modelfile (Mistral is already working), but there's still a long way to go to support more models. That method doesn't yet add the quantization step, but the hope is to get that in relatively soon.

For now though, you should be able to use the updated doc. I'll go ahead and close the issue.

<!-- gh-comment-id:1992628871 --> @pdevine commented on GitHub (Mar 12, 2024): We actually changed the docs on this a while back to not use the docker image for quantizing. You can see it [here](https://github.com/ollama/ollama/blob/main/docs/import.md#quantize-the-model). I have been working on a new way to convert directly from safetensors directly into Ollama from a Modelfile (Mistral is already working), but there's still a long way to go to support more models. That method doesn't _yet_ add the quantization step, but the hope is to get that in relatively soon. For now though, you should be able to use the updated doc. I'll go ahead and close the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26707