[GH-ISSUE #11663] Error: 500 Internal Server Error: unable to load model with "alibayram/smollm3:latest" and "alibayram/hunyuan:7b" #69770

Closed
opened 2026-05-04 19:08:31 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @SilentWhiteRabbit on GitHub (Aug 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11663

What is the issue?

hello, when i use ollama to run 【alibayram/smollm3:latest】and 【alibayram/hunyuan:7b】,this error happens.i will use【alibayram/hunyuan:7b】as an example:

C:\Users\xxx>ollama run alibayram/hunyuan:7b
Error: 500 Internal Server Error: unable to load model: C:\Users\xxx.ollama\models\blobs\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d

and I also use deepseek-r1:8b to test,it‘s ok.

C:\Users\xxx>ollama run alibayram/hunyuan:7b
Error: 500 Internal Server Error: unable to load model: C:\Users\xxx.ollama\models\blobs\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d
C:\Users\xxx>ollama run deepseek-r1:8b
>>>

so may there some wrong with 【alibayram/smollm3:latest】and 【alibayram/hunyuan:7b】

Relevant log output

this is logs,i use 【alibayram/hunyuan:7b】 as an example.
app.log has no change.

server.log's change as follows:

[GIN] 2025/08/05 - 17:14:40 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2025/08/05 - 17:14:40 | 200 |    225.4768ms |       127.0.0.1 | POST     "/api/show"
time=2025-08-05T17:14:41.107+08:00 level=INFO source=server.go:135 msg="system memory" total="31.7 GiB" free="9.8 GiB" free_swap="7.4 GiB"
time=2025-08-05T17:14:41.108+08:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=33 layers.offload=10 layers.split="" memory.available="[6.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.7 GiB" memory.required.partial="5.9 GiB" memory.required.kv="4.0 GiB" memory.required.allocations="[5.9 GiB]" memory.weights.total="4.3 GiB" memory.weights.repeating="3.9 GiB" memory.weights.nonrepeating="410.7 MiB" memory.graph.full="2.7 GiB" memory.graph.partial="2.7 GiB"
llama_model_loader: loaded meta data with 39 key-value pairs and 354 tensors from C:\Users\xxx\.ollama\models\blobs\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = hunyuan-dense
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Hunyuan 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Hunyuan
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                   general.base_model.count u32              = 1
llama_model_loader: - kv   7:                  general.base_model.0.name str              = Hunyuan 7B Pretrain
llama_model_loader: - kv   8:          general.base_model.0.organization str              = Tencent
llama_model_loader: - kv   9:              general.base_model.0.repo_url str              = https://huggingface.co/tencent/Hunyua...
llama_model_loader: - kv  10:                  hunyuan-dense.block_count u32              = 32
llama_model_loader: - kv  11:               hunyuan-dense.context_length u32              = 262144
llama_model_loader: - kv  12:             hunyuan-dense.embedding_length u32              = 4096
llama_model_loader: - kv  13:          hunyuan-dense.feed_forward_length u32              = 14336
llama_model_loader: - kv  14:         hunyuan-dense.attention.head_count u32              = 32
llama_model_loader: - kv  15:      hunyuan-dense.attention.head_count_kv u32              = 8
llama_model_loader: - kv  16:               hunyuan-dense.rope.freq_base f32              = 11158840.000000
llama_model_loader: - kv  17: hunyuan-dense.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  18:         hunyuan-dense.attention.key_length u32              = 128
llama_model_loader: - kv  19:       hunyuan-dense.attention.value_length u32              = 128
llama_model_loader: - kv  20:            hunyuan-dense.rope.scaling.type str              = none
llama_model_loader: - kv  21:          hunyuan-dense.rope.scaling.factor f32              = 1.000000
llama_model_loader: - kv  22: hunyuan-dense.rope.scaling.original_context_length u32              = 262144
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = hunyuan
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,128167]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,128167]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,127698]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 127958
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 127960
llama_model_loader: - kv  30:          tokenizer.ggml.seperator_token_id u32              = 127962
llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 127961
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if not add_generation_prompt is d...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - kv  34:                          general.file_type u32              = 15
llama_model_loader: - kv  35:                      quantize.imatrix.file str              = Hunyuan-7B-Instruct/Hunyuan-7B-Instru...
llama_model_loader: - kv  36:                   quantize.imatrix.dataset str              = calibration_datav3.txt
llama_model_loader: - kv  37:             quantize.imatrix.entries_count u32              = 224
llama_model_loader: - kv  38:              quantize.imatrix.chunks_count u32              = 130
llama_model_loader: - type  f32:  129 tensors
llama_model_loader: - type q4_K:  192 tensors
llama_model_loader: - type q6_K:   33 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 4.30 GiB (4.92 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'hunyuan-dense'
llama_model_load_from_file_impl: failed to load model
time=2025-08-05T17:14:41.554+08:00 level=INFO source=sched.go:453 msg="NewLlamaServer failed" model=C:\Users\xxx\.ollama\models\blobs\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d error="unable to load model: C:\\Users\\xxx\\.ollama\\models\\blobs\\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d"
[GIN] 2025/08/05 - 17:14:41 | 500 |    805.9905ms |       127.0.0.1 | POST     "/api/generate"

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.10.1

Originally created by @SilentWhiteRabbit on GitHub (Aug 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11663 ### What is the issue? hello, when i use ollama to run 【alibayram/smollm3:latest】and 【alibayram/hunyuan:7b】,this error happens.i will use【alibayram/hunyuan:7b】as an example: > C:\Users\xxx>ollama run alibayram/hunyuan:7b > Error: 500 Internal Server Error: unable to load model: C:\Users\xxx\.ollama\models\blobs\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d and I also use deepseek-r1:8b to test,it‘s ok. > C:\Users\xxx>ollama run alibayram/hunyuan:7b > Error: 500 Internal Server Error: unable to load model: C:\Users\xxx\.ollama\models\blobs\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d > C:\Users\xxx>ollama run deepseek-r1:8b > \>\>\> so may there some wrong with 【alibayram/smollm3:latest】and 【alibayram/hunyuan:7b】 ### Relevant log output ```shell this is logs,i use 【alibayram/hunyuan:7b】 as an example. app.log has no change. server.log's change as follows: [GIN] 2025/08/05 - 17:14:40 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2025/08/05 - 17:14:40 | 200 | 225.4768ms | 127.0.0.1 | POST "/api/show" time=2025-08-05T17:14:41.107+08:00 level=INFO source=server.go:135 msg="system memory" total="31.7 GiB" free="9.8 GiB" free_swap="7.4 GiB" time=2025-08-05T17:14:41.108+08:00 level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=33 layers.offload=10 layers.split="" memory.available="[6.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.7 GiB" memory.required.partial="5.9 GiB" memory.required.kv="4.0 GiB" memory.required.allocations="[5.9 GiB]" memory.weights.total="4.3 GiB" memory.weights.repeating="3.9 GiB" memory.weights.nonrepeating="410.7 MiB" memory.graph.full="2.7 GiB" memory.graph.partial="2.7 GiB" llama_model_loader: loaded meta data with 39 key-value pairs and 354 tensors from C:\Users\xxx\.ollama\models\blobs\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = hunyuan-dense llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Hunyuan 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Hunyuan llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Hunyuan 7B Pretrain llama_model_loader: - kv 8: general.base_model.0.organization str = Tencent llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/tencent/Hunyua... llama_model_loader: - kv 10: hunyuan-dense.block_count u32 = 32 llama_model_loader: - kv 11: hunyuan-dense.context_length u32 = 262144 llama_model_loader: - kv 12: hunyuan-dense.embedding_length u32 = 4096 llama_model_loader: - kv 13: hunyuan-dense.feed_forward_length u32 = 14336 llama_model_loader: - kv 14: hunyuan-dense.attention.head_count u32 = 32 llama_model_loader: - kv 15: hunyuan-dense.attention.head_count_kv u32 = 8 llama_model_loader: - kv 16: hunyuan-dense.rope.freq_base f32 = 11158840.000000 llama_model_loader: - kv 17: hunyuan-dense.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 18: hunyuan-dense.attention.key_length u32 = 128 llama_model_loader: - kv 19: hunyuan-dense.attention.value_length u32 = 128 llama_model_loader: - kv 20: hunyuan-dense.rope.scaling.type str = none llama_model_loader: - kv 21: hunyuan-dense.rope.scaling.factor f32 = 1.000000 llama_model_loader: - kv 22: hunyuan-dense.rope.scaling.original_context_length u32 = 262144 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = hunyuan llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,128167] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,128167] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,127698] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 127958 llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 127960 llama_model_loader: - kv 30: tokenizer.ggml.seperator_token_id u32 = 127962 llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 127961 llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if not add_generation_prompt is d... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - kv 34: general.file_type u32 = 15 llama_model_loader: - kv 35: quantize.imatrix.file str = Hunyuan-7B-Instruct/Hunyuan-7B-Instru... llama_model_loader: - kv 36: quantize.imatrix.dataset str = calibration_datav3.txt llama_model_loader: - kv 37: quantize.imatrix.entries_count u32 = 224 llama_model_loader: - kv 38: quantize.imatrix.chunks_count u32 = 130 llama_model_loader: - type f32: 129 tensors llama_model_loader: - type q4_K: 192 tensors llama_model_loader: - type q6_K: 33 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.30 GiB (4.92 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'hunyuan-dense' llama_model_load_from_file_impl: failed to load model time=2025-08-05T17:14:41.554+08:00 level=INFO source=sched.go:453 msg="NewLlamaServer failed" model=C:\Users\xxx\.ollama\models\blobs\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d error="unable to load model: C:\\Users\\xxx\\.ollama\\models\\blobs\\sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d" [GIN] 2025/08/05 - 17:14:41 | 500 | 805.9905ms | 127.0.0.1 | POST "/api/generate" ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.10.1
GiteaMirror added the bug label 2026-05-04 19:08:31 -05:00
Author
Owner

@SilentWhiteRabbit commented on GitHub (Aug 5, 2025):

【alibayram/smollm3:latest】has the similiar log:

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'smollm3'
llama_model_load_from_file_impl: failed to load model

<!-- gh-comment-id:3154274335 --> @SilentWhiteRabbit commented on GitHub (Aug 5, 2025): 【alibayram/smollm3:latest】has the similiar log: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'smollm3' llama_model_load_from_file_impl: failed to load model
Author
Owner

@rick-github commented on GitHub (Aug 5, 2025):

These models are currently unsupported on ollama.

<!-- gh-comment-id:3154675728 --> @rick-github commented on GitHub (Aug 5, 2025): These models are currently unsupported on ollama.
Author
Owner

@printlndarling commented on GitHub (Aug 5, 2025):

These models are currently unsupported on ollama.

Hi, I'm running into the same issue where hunyuan-dense isn't supported by Ollama. It seems like an architecture compatibility problem. Are there plans to support it in the future? Thanks!

<!-- gh-comment-id:3155811344 --> @printlndarling commented on GitHub (Aug 5, 2025): > These models are currently unsupported on ollama. Hi, I'm running into the same issue where hunyuan-dense isn't supported by Ollama. It seems like an architecture compatibility problem. Are there plans to support it in the future? Thanks!
Author
Owner

@jmcastagnetto commented on GitHub (Aug 10, 2025):

I think it is related to what is discussed in this issue: https://github.com/ollama/ollama/issues/2232#issuecomment-3165046147

<!-- gh-comment-id:3172753776 --> @jmcastagnetto commented on GitHub (Aug 10, 2025): I think it is related to what is discussed in this issue: https://github.com/ollama/ollama/issues/2232#issuecomment-3165046147
Author
Owner

@rick-github commented on GitHub (Aug 10, 2025):

These models are currently unsupported on ollama.

#11823 should add support.

<!-- gh-comment-id:3172764368 --> @rick-github commented on GitHub (Aug 10, 2025): These models are currently unsupported on ollama. #11823 should add support.
Author
Owner

@fabiomatricardi commented on GitHub (Aug 13, 2025):

so why are they present in the official Ollama model hub?

<!-- gh-comment-id:3182184024 --> @fabiomatricardi commented on GitHub (Aug 13, 2025): so why are they present in the official Ollama model hub?
Author
Owner

@rick-github commented on GitHub (Aug 13, 2025):

They are not in the official ollama model hub. They are in the user uploaded model hub.

<!-- gh-comment-id:3182374724 --> @rick-github commented on GitHub (Aug 13, 2025): They are not in the official ollama model hub. They are in the user uploaded model hub.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69770