[GH-ISSUE #13423] issue: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen3' #16905

New Issue

GiteaMirror · 2026-04-19T22:43:38-05:00

GiteaMirror commented

2026-04-19 22:43:38 -05:00

Originally created by @xinmans on GitHub (May 2, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13423

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.5

Ollama Version (if applicable)

No response

Operating System

Ubuntu 22.04

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

select qwen3:1.7b

Actual Behavior

500: Ollama: 500, message='Internal Server Error', url='http://localhost:11434/api/chat'

Steps to Reproduce

download and select qwen3:1.7b run
“Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option.
”

Logs & Screenshots

                                                                                                                                                             ││   llama_model_loader: - kv  13:                 qwen3.attention.key_length u32              = 128                                                                 ││   llama_model_loader: - kv  14:               qwen3.attention.value_length u32              = 128                                                                 ││   llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2                                                                ││   llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2                                                               ││   llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...                            ││   llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...                            ││   llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...                                   ││   llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645                                                              ││   llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643                                                              ││   llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643                                                              ││   llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false                                                               ││   llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...                           ││   llama_model_loader: - kv  25:               general.quantization_version u32              = 2                                                                   ││   llama_model_loader: - kv  26:                          general.file_type u32              = 15                                                                  ││   llama_model_loader: - type  f32:  113 tensors                                                                                                                   ││   llama_model_loader: - type  f16:   28 tensors                                                                                                                   ││   llama_model_loader: - type q4_K:  155 tensors                                                                                                                   ││   llama_model_loader: - type q6_K:   15 tensors                                                                                                                   ││   print_info: file format = GGUF V3 (latest)                                                                                                                      ││   print_info: file type   = Q4_K - Medium                                                                                                                         ││   print_info: file size   = 1.26 GiB (5.33 BPW)                                                                                                                   ││   llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen3'                                                    ││   llama_model_load_from_file_impl: failed to load model                                                                                                           ││   time=2025-05-02T07:21:06.646Z level=INFO source=sched.go:430 msg="NewLlamaServer failed" model=/root/.ollama/models/blobs/sha256-3d0b790534fe4b79525fc3692950   ││   408dca41171676ed7e21db57af5c65ef6ab6 error="unable to load model: /root/.ollama/models/blobs/sha256-3d0b790534fe4b79525fc3692950408dca41171676ed7e21db57af5c6   ││   5ef6ab6"                                                                                                                                                        ││   [GIN] 2025/05/02 - 07:21:06 | 500 |  6.326236851s |             ::1 | POST     "/api/chat"                                                                      ││   2025-05-02 07:21:06.648 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9573 - "POST /api/chat/completions HTTP/1.1" 400 - {}      ││   2025-05-02 07:21:06.679 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9573 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {}       ││   2025-05-02 07:21:58.616 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9610 - "GET /api/v1/chats/43989c91-840a-4f14-b1c6-ffa86a   ││   0e1649 HTTP/1.1" 200 - {}                                                                                                                                       ││   2025-05-02 07:21:58.646 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9611 - "GET /api/v1/chats/325bb1d4-7f8d-48fb-b486-072b1a   ││   a24429 HTTP/1.1" 200 - {}                                                                                                                                       ││   2025-05-02 07:21:58.652 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9610 - "GET /api/v1/chats/43989c91-840a-4f14-b1c6-ffa86a   ││   0e1649 HTTP/1.1" 200 - {}                                                                                                                                       ││   [GIN] 2025/05/02 - 07:22:04 | 200 |      61.366µs |             ::1 | GET      "/api/version"                                                                   ││   2025-05-02 07:22:04.932 | INFO     | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9612 - "GET /ollama/api/version HTTP/1.1" 200 - {}         ││                                                                                                                                                                   │

Additional Information

No response

Originally created by @xinmans on GitHub (May 2, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/13423 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.5 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 22.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior select qwen3:1.7b ### Actual Behavior 500: Ollama: 500, message='Internal Server Error', url='http://localhost:11434/api/chat' ### Steps to Reproduce download and select qwen3:1.7b run “Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option. ” ### Logs & Screenshots ││ llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 ││ llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 ││ llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 ││ llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 ││ llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... ││ llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... ││ llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... ││ llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 ││ llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 ││ llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 ││ llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false ││ llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... ││ llama_model_loader: - kv 25: general.quantization_version u32 = 2 ││ llama_model_loader: - kv 26: general.file_type u32 = 15 ││ llama_model_loader: - type f32: 113 tensors ││ llama_model_loader: - type f16: 28 tensors ││ llama_model_loader: - type q4_K: 155 tensors ││ llama_model_loader: - type q6_K: 15 tensors ││ print_info: file format = GGUF V3 (latest) ││ print_info: file type = Q4_K - Medium ││ print_info: file size = 1.26 GiB (5.33 BPW) ││ llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen3' ││ llama_model_load_from_file_impl: failed to load model ││ time=2025-05-02T07:21:06.646Z level=INFO source=sched.go:430 msg="NewLlamaServer failed" model=/root/.ollama/models/blobs/sha256-3d0b790534fe4b79525fc3692950 ││ 408dca41171676ed7e21db57af5c65ef6ab6 error="unable to load model: /root/.ollama/models/blobs/sha256-3d0b790534fe4b79525fc3692950408dca41171676ed7e21db57af5c6 ││ 5ef6ab6" ││ [GIN] 2025/05/02 - 07:21:06 | 500 | 6.326236851s | ::1 | POST "/api/chat" ││ 2025-05-02 07:21:06.648 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9573 - "POST /api/chat/completions HTTP/1.1" 400 - {} ││ 2025-05-02 07:21:06.679 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9573 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200 - {} ││ 2025-05-02 07:21:58.616 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9610 - "GET /api/v1/chats/43989c91-840a-4f14-b1c6-ffa86a ││ 0e1649 HTTP/1.1" 200 - {} ││ 2025-05-02 07:21:58.646 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9611 - "GET /api/v1/chats/325bb1d4-7f8d-48fb-b486-072b1a ││ a24429 HTTP/1.1" 200 - {} ││ 2025-05-02 07:21:58.652 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9610 - "GET /api/v1/chats/43989c91-840a-4f14-b1c6-ffa86a ││ 0e1649 HTTP/1.1" 200 - {} ││ [GIN] 2025/05/02 - 07:22:04 | 200 | 61.366µs | ::1 | GET "/api/version" ││ 2025-05-02 07:22:04.932 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 192.168.31.178:9612 - "GET /ollama/api/version HTTP/1.1" 200 - {} ││ │ ### Additional Information _No response_

GiteaMirror added the bug label 2026-04-19 22:43:38 -05:00

GiteaMirror closed this issue

2026-04-19 22:43:39 -05:00

GiteaMirror referenced this issue

2026-04-20 05:19:06 -05:00

[PR #16905] [CLOSED] fix: eliminate duplicate query generation in web search functionality #24276

GiteaMirror referenced this issue

2026-04-25 12:20:13 -05:00

[PR #16905] [CLOSED] fix: eliminate duplicate query generation in web search functionality #39906

GiteaMirror referenced this issue

2026-04-29 22:31:21 -05:00

[PR #16905] [CLOSED] fix: eliminate duplicate query generation in web search functionality #47324