[GH-ISSUE #6231] Why until now the Qwen2ForCausalLM is not supported until now #50407

Closed
opened 2026-04-28 15:39:09 -05:00 by GiteaMirror · 19 comments
Owner

Originally created by @wisamidris7 on GitHub (Aug 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6231

What is the issue?

I made ollama create it says

> ollama create myai -f ./Modelfile
transferring model data
converting model
Error: unsupported architecture

and this

> docker run --rm -v .:/model ollama/quantize -q q4_0 /model
unknown architecture Qwen2ForCausalLM

OS

Windows

GPU

No response

CPU

No response

Ollama version

ollama version is 0.3.3

Originally created by @wisamidris7 on GitHub (Aug 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6231 ### What is the issue? I made ollama create it says ``` > ollama create myai -f ./Modelfile transferring model data converting model Error: unsupported architecture ``` and this ``` > docker run --rm -v .:/model ollama/quantize -q q4_0 /model unknown architecture Qwen2ForCausalLM ``` ### OS Windows ### GPU _No response_ ### CPU _No response_ ### Ollama version ollama version is 0.3.3
GiteaMirror added the bug label 2026-04-28 15:39:09 -05:00
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

Does llama3 architecture supported

The problem I need a model that support multi language like meta llama3 or qwen2
The language required is arabic

<!-- gh-comment-id:2273498228 --> @wisamidris7 commented on GitHub (Aug 7, 2024): Does llama3 architecture supported The problem I need a model that support multi language like meta llama3 or qwen2 The language required is arabic
Author
Owner

@rick-github commented on GitHub (Aug 7, 2024):

ollama/quantize uses an older version of llama.cpp that doesn't support qwen2, see the supported architectures on https://registry.hub.docker.com/r/ollama/quantize. However, both qwen2 and llama3 models are already in the ollama library, you can just pull them and use them: ollama pull qwen2, ollama pull llama3. Note that llama3 is old as models go, and llama3.1 might work better, depending on its support for Arabic.

<!-- gh-comment-id:2273591108 --> @rick-github commented on GitHub (Aug 7, 2024): ollama/quantize uses an older version of llama.cpp that doesn't support qwen2, see the supported architectures on https://registry.hub.docker.com/r/ollama/quantize. However, both [qwen2](https://ollama.com/search?q=qwen2) and [llama3](https://ollama.com/search?q=llama3) models are already in the ollama library, you can just pull them and use them: `ollama pull qwen2`, `ollama pull llama3`. Note that llama3 is old as models go, and llama3.1 might work better, depending on its support for Arabic.
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

No the problem I had fine tuned qwen2 I need to fine tune it for personal purposes
And when I fine tuned it into lora it was very very stupid and the loss rate was 3.7 in the end
And when I fine tuned it into full I didn't test it in ollama until now and the loss rate was 0.2 in the end

<!-- gh-comment-id:2273637794 --> @wisamidris7 commented on GitHub (Aug 7, 2024): No the problem I had fine tuned qwen2 I need to fine tune it for personal purposes And when I fine tuned it into lora it was very very stupid and the loss rate was 3.7 in the end And when I fine tuned it into full I didn't test it in ollama until now and the loss rate was 0.2 in the end
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

And also vllm has supported it aleardy

image

<!-- gh-comment-id:2273640263 --> @wisamidris7 commented on GitHub (Aug 7, 2024): And also vllm has supported it aleardy ![image](https://github.com/user-attachments/assets/6b838adf-32fb-43df-bc7c-01090cb4af7e)
Author
Owner

@rick-github commented on GitHub (Aug 7, 2024):

As I said, ollama/quantize uses an old version of llama.cpp. If you want qwen2 support, use a newer version from https://ghcr.io/ggerganov/llama.cpp.

<!-- gh-comment-id:2273655075 --> @rick-github commented on GitHub (Aug 7, 2024): As I said, ollama/quantize uses an old version of llama.cpp. If you want qwen2 support, use a newer version from https://ghcr.io/ggerganov/llama.cpp.
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

Thanks but how to use it to convert it?
Cause I'm new at this

What command or exe I will use

<!-- gh-comment-id:2273658627 --> @wisamidris7 commented on GitHub (Aug 7, 2024): Thanks but how to use it to convert it? Cause I'm new at this What command or exe I will use
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

I mean I need to use windows version not docker?

<!-- gh-comment-id:2273661007 --> @wisamidris7 commented on GitHub (Aug 7, 2024): I mean I need to use windows version not docker?
Author
Owner

@rick-github commented on GitHub (Aug 7, 2024):

This is what I use:

docker run -it --gpus all -v .:/app/models ghcr.io/ggerganov/llama.cpp:full-cuda -q /app/models/model.gguf Q4_K_M
<!-- gh-comment-id:2273666728 --> @rick-github commented on GitHub (Aug 7, 2024): This is what I use: ``` docker run -it --gpus all -v .:/app/models ghcr.io/ggerganov/llama.cpp:full-cuda -q /app/models/model.gguf Q4_K_M ```
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

Thanks very much!
👍

<!-- gh-comment-id:2273667875 --> @wisamidris7 commented on GitHub (Aug 7, 2024): Thanks very much! 👍
Author
Owner

@rick-github commented on GitHub (Aug 7, 2024):

docker pull ghcr.io/ggerganov/llama.cpp:full-cuda

<!-- gh-comment-id:2273683775 --> @rick-github commented on GitHub (Aug 7, 2024): `docker pull ghcr.io/ggerganov/llama.cpp:full-cuda`
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

No it seems i need to login with my github account

<!-- gh-comment-id:2273684690 --> @wisamidris7 commented on GitHub (Aug 7, 2024): No it seems i need to login with my github account
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

Sorry but I mean to convert it from full or model.safetensors to gguf or ggml using ghcr.io/ggerganov/llama.cpp:full-cuda
It says this

main: build = 0 (unknown)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/app/models/' to '/app/models/ggml-model-Q4_K_M.gguf' as Q4_K_M
gguf_init_from_file: invalid magic characters 'D'
llama_model_quantize: failed to quantize: llama_model_loader: failed to load model from /app/models/

main: failed to quantize model from '/app/models/'
<!-- gh-comment-id:2273771952 --> @wisamidris7 commented on GitHub (Aug 7, 2024): Sorry but I mean to convert it from full or model.safetensors to gguf or ggml using ghcr.io/ggerganov/llama.cpp:full-cuda It says this ``` main: build = 0 (unknown) main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu main: quantizing '/app/models/' to '/app/models/ggml-model-Q4_K_M.gguf' as Q4_K_M gguf_init_from_file: invalid magic characters 'D' llama_model_quantize: failed to quantize: llama_model_loader: failed to load model from /app/models/ main: failed to quantize model from '/app/models/' ```
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

All I care about is running it on ollama

<!-- gh-comment-id:2273773841 --> @wisamidris7 commented on GitHub (Aug 7, 2024): All I care about is running it on ollama
Author
Owner

@wisamidris7 commented on GitHub (Aug 7, 2024):

I used this to convert it

https://github.com/kevkid/gguf_gui

<!-- gh-comment-id:2273794839 --> @wisamidris7 commented on GitHub (Aug 7, 2024): I used this to convert it https://github.com/kevkid/gguf_gui
Author
Owner

@henryxiao1997 commented on GitHub (Oct 25, 2024):

I'm not familar with docker commend. Is it possible to use new llama.cpp in the commendline way, i.e. in 'ollama create mymodel'? Thanks!

<!-- gh-comment-id:2436647167 --> @henryxiao1997 commented on GitHub (Oct 25, 2024): I'm not familar with docker commend. Is it possible to use new llama.cpp in the commendline way, i.e. in 'ollama create mymodel'? Thanks!
Author
Owner

@rick-github commented on GitHub (Oct 25, 2024):

Try it. ollama create --quantize will handle some models. If it doesn't work for the model you try it on, post details.

<!-- gh-comment-id:2436665951 --> @rick-github commented on GitHub (Oct 25, 2024): Try it. `ollama create --quantize` will handle some models. If it doesn't work for the model you try it on, post details.
Author
Owner

@sunday-hao commented on GitHub (Dec 17, 2024):

I am in the same troube with @wisamidris7. OS -> Windows 10, newest version of ollama.
I firstly fine-tuned the Qwen2.5-Instruct base model on my private dataset. Then, I covert the fine-tuned model into .gguf format by llama.cpp. While I tried to 'ollama create' and 'ollama create --quantize' my fine-tuned model, it both turned out to 'Error: unsupported architecture'. Nevertheless, I saw qwen2.5 series supported in the ollama libraryhttps://ollama.com/library So, I can't run my fine-tuned model on ollama.
Is this a bug? Or any solution?

<!-- gh-comment-id:2547717630 --> @sunday-hao commented on GitHub (Dec 17, 2024): I am in the same troube with @wisamidris7. OS -> Windows 10, newest version of ollama. I firstly fine-tuned the Qwen2.5-Instruct base model on my private dataset. Then, I covert the fine-tuned model into .gguf format by llama.cpp. While I tried to 'ollama create' and 'ollama create --quantize' my fine-tuned model, it both turned out to 'Error: unsupported architecture'. Nevertheless, I saw qwen2.5 series supported in the ollama library[https://ollama.com/library](url) So, I can't run my fine-tuned model on ollama. Is this a bug? Or any solution?
Author
Owner

@rick-github commented on GitHub (Dec 17, 2024):

If you add some details it will make it easier to debug. Pointer to base model, llama.cpp command, contests of Modelfile, error message.

<!-- gh-comment-id:2547743156 --> @rick-github commented on GitHub (Dec 17, 2024): If you add some details it will make it easier to debug. Pointer to base model, llama.cpp command, contests of Modelfile, error message.
Author
Owner

@sunday-hao commented on GitHub (Dec 17, 2024):

I opened a new issue, there is more details.
And my base model is Qwen2.5-Instruct.
the command of llama.cpp used to convert the format of my fine-tuned model as below:
python convert-hf-to-gguf.py --outfile path/to/outfile path/to/safetensors
contents of Modelfile as below:
FROM /path/to/gguf_file
error message as below:

ollama create my_model -f ./Modelfile
transferring model data 100%
converting model
Error: unsupported architecture

<!-- gh-comment-id:2547767366 --> @sunday-hao commented on GitHub (Dec 17, 2024): I opened a new issue, there is more details. And my base model is Qwen2.5-Instruct. the command of llama.cpp used to convert the format of my fine-tuned model as below: `python convert-hf-to-gguf.py --outfile path/to/outfile path/to/safetensors` contents of Modelfile as below: `FROM /path/to/gguf_file` error message as below: > ollama create my_model -f ./Modelfile > transferring model data 100% > converting model > Error: unsupported architecture
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50407