[GH-ISSUE #11661] ollama run alibayram/hunyuan:0.5b error #7709

Closed
opened 2026-04-12 19:48:52 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @workmengxue on GitHub (Aug 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11661

What is the issue?

C:\Users\mx>ollama run alibayram/hunyuan:0.5b
pulling manifest
pulling 65fe0e5b8f67: 100% ▕██████████████████████████████████████████████████████████▏ 354 MB
pulling eb9280c126b3: 100% ▕██████████████████████████████████████████████████████████▏ 407 B
pulling d63f3f323c98: 100% ▕██████████████████████████████████████████████████████████▏ 132 B
pulling 7caa25a98b52: 100% ▕██████████████████████████████████████████████████████████▏ 432 B
verifying sha256 digest
writing manifest
success
Error: 500 Internal Server Error: unable to load model: C:\Users\mx.ollama\models\blobs\sha256-65fe0e5b8f677752bf7176ea42c01b466d7ff654d8577d5f62d3028c75ba4add

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @workmengxue on GitHub (Aug 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11661 ### What is the issue? C:\Users\mx>ollama run alibayram/hunyuan:0.5b pulling manifest pulling 65fe0e5b8f67: 100% ▕██████████████████████████████████████████████████████████▏ 354 MB pulling eb9280c126b3: 100% ▕██████████████████████████████████████████████████████████▏ 407 B pulling d63f3f323c98: 100% ▕██████████████████████████████████████████████████████████▏ 132 B pulling 7caa25a98b52: 100% ▕██████████████████████████████████████████████████████████▏ 432 B verifying sha256 digest writing manifest success Error: 500 Internal Server Error: unable to load model: C:\Users\mx\.ollama\models\blobs\sha256-65fe0e5b8f677752bf7176ea42c01b466d7ff654d8577d5f62d3028c75ba4add ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 19:48:52 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 5, 2025):

Server logs will help in debugging. Most likely problem is unsupported architecture.

https://github.com/ollama/ollama/issues/11361
https://github.com/ollama/ollama/issues/11239
https://github.com/ollama/ollama/issues/7503

<!-- gh-comment-id:3153782699 --> @rick-github commented on GitHub (Aug 5, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging. Most likely problem is unsupported architecture. https://github.com/ollama/ollama/issues/11361 https://github.com/ollama/ollama/issues/11239 https://github.com/ollama/ollama/issues/7503
Author
Owner

@svengong commented on GitHub (Aug 7, 2025):

ime=2025-08-07T13:32:00.959+08:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/sven/.ollama/models/blobs/sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d gpu=0 parallel=2 available=22906503168 required="18.5 GiB"
time=2025-08-07T13:32:00.960+08:00 level=INFO source=server.go:135 msg="system memory" total="32.0 GiB" free="9.1 GiB" free_swap="0 B"
time=2025-08-07T13:32:00.960+08:00 level=INFO source=server.go:175 msg=offload library=metal layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[21.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="18.5 GiB" memory.required.partial="18.5 GiB" memory.required.kv="8.0 GiB" memory.required.allocations="[18.5 GiB]" memory.weights.total="4.3 GiB" memory.weights.repeating="3.9 GiB" memory.weights.nonrepeating="410.7 MiB" memory.graph.full="5.3 GiB" memory.graph.partial="5.3 GiB"
llama_model_load_from_file_impl: using device Metal (Apple M1 Pro) - 21845 MiB free
llama_model_loader: loaded meta data with 39 key-value pairs and 354 tensors from /Users/sven/.ollama/models/blobs/sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = hunyuan-dense
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Hunyuan 7B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Hunyuan
llama_model_loader: - kv 5: general.size_label str = 7B
llama_model_loader: - kv 6: general.base_model.count u32 = 1
llama_model_loader: - kv 7: general.base_model.0.name str = Hunyuan 7B Pretrain
llama_model_loader: - kv 8: general.base_model.0.organization str = Tencent
llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/tencent/Hunyua...
llama_model_loader: - kv 10: hunyuan-dense.block_count u32 = 32
llama_model_loader: - kv 11: hunyuan-dense.context_length u32 = 262144
llama_model_loader: - kv 12: hunyuan-dense.embedding_length u32 = 4096
llama_model_loader: - kv 13: hunyuan-dense.feed_forward_length u32 = 14336
llama_model_loader: - kv 14: hunyuan-dense.attention.head_count u32 = 32
llama_model_loader: - kv 15: hunyuan-dense.attention.head_count_kv u32 = 8
llama_model_loader: - kv 16: hunyuan-dense.rope.freq_base f32 = 11158840.000000
llama_model_loader: - kv 17: hunyuan-dense.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 18: hunyuan-dense.attention.key_length u32 = 128
llama_model_loader: - kv 19: hunyuan-dense.attention.value_length u32 = 128
llama_model_loader: - kv 20: hunyuan-dense.rope.scaling.type str = none
llama_model_loader: - kv 21: hunyuan-dense.rope.scaling.factor f32 = 1.000000
llama_model_loader: - kv 22: hunyuan-dense.rope.scaling.original_context_length u32 = 262144
llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 24: tokenizer.ggml.pre str = hunyuan
llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,128167] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,128167] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,127698] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 127958
llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 127960
llama_model_loader: - kv 30: tokenizer.ggml.seperator_token_id u32 = 127962
llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 127961
llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if not add_generation_prompt is d...
llama_model_loader: - kv 33: general.quantization_version u32 = 2
llama_model_loader: - kv 34: general.file_type u32 = 15
llama_model_loader: - kv 35: quantize.imatrix.file str = Hunyuan-7B-Instruct/Hunyuan-7B-Instru...
llama_model_loader: - kv 36: quantize.imatrix.dataset str = calibration_datav3.txt
llama_model_loader: - kv 37: quantize.imatrix.entries_count u32 = 224
llama_model_loader: - kv 38: quantize.imatrix.chunks_count u32 = 130
llama_model_loader: - type f32: 129 tensors
llama_model_loader: - type q4_K: 192 tensors
llama_model_loader: - type q6_K: 33 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 4.30 GiB (4.92 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'hunyuan-dense'
llama_model_load_from_file_impl: failed to load model
time=2025-08-07T13:32:01.020+08:00 level=INFO source=sched.go:455 msg="NewLlamaServer failed" model=/Users/sven/.ollama/models/blobs/sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d error="unable to load model: /Users/sven/.ollama/models/blobs/sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d"

<!-- gh-comment-id:3162561395 --> @svengong commented on GitHub (Aug 7, 2025): ime=2025-08-07T13:32:00.959+08:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/sven/.ollama/models/blobs/sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d gpu=0 parallel=2 available=22906503168 required="18.5 GiB" time=2025-08-07T13:32:00.960+08:00 level=INFO source=server.go:135 msg="system memory" total="32.0 GiB" free="9.1 GiB" free_swap="0 B" time=2025-08-07T13:32:00.960+08:00 level=INFO source=server.go:175 msg=offload library=metal layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[21.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="18.5 GiB" memory.required.partial="18.5 GiB" memory.required.kv="8.0 GiB" memory.required.allocations="[18.5 GiB]" memory.weights.total="4.3 GiB" memory.weights.repeating="3.9 GiB" memory.weights.nonrepeating="410.7 MiB" memory.graph.full="5.3 GiB" memory.graph.partial="5.3 GiB" llama_model_load_from_file_impl: using device Metal (Apple M1 Pro) - 21845 MiB free llama_model_loader: loaded meta data with 39 key-value pairs and 354 tensors from /Users/sven/.ollama/models/blobs/sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = hunyuan-dense llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Hunyuan 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Hunyuan llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.base_model.count u32 = 1 llama_model_loader: - kv 7: general.base_model.0.name str = Hunyuan 7B Pretrain llama_model_loader: - kv 8: general.base_model.0.organization str = Tencent llama_model_loader: - kv 9: general.base_model.0.repo_url str = https://huggingface.co/tencent/Hunyua... llama_model_loader: - kv 10: hunyuan-dense.block_count u32 = 32 llama_model_loader: - kv 11: hunyuan-dense.context_length u32 = 262144 llama_model_loader: - kv 12: hunyuan-dense.embedding_length u32 = 4096 llama_model_loader: - kv 13: hunyuan-dense.feed_forward_length u32 = 14336 llama_model_loader: - kv 14: hunyuan-dense.attention.head_count u32 = 32 llama_model_loader: - kv 15: hunyuan-dense.attention.head_count_kv u32 = 8 llama_model_loader: - kv 16: hunyuan-dense.rope.freq_base f32 = 11158840.000000 llama_model_loader: - kv 17: hunyuan-dense.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 18: hunyuan-dense.attention.key_length u32 = 128 llama_model_loader: - kv 19: hunyuan-dense.attention.value_length u32 = 128 llama_model_loader: - kv 20: hunyuan-dense.rope.scaling.type str = none llama_model_loader: - kv 21: hunyuan-dense.rope.scaling.factor f32 = 1.000000 llama_model_loader: - kv 22: hunyuan-dense.rope.scaling.original_context_length u32 = 262144 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = hunyuan llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,128167] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,128167] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,127698] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 127958 llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 127960 llama_model_loader: - kv 30: tokenizer.ggml.seperator_token_id u32 = 127962 llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 127961 llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if not add_generation_prompt is d... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - kv 34: general.file_type u32 = 15 llama_model_loader: - kv 35: quantize.imatrix.file str = Hunyuan-7B-Instruct/Hunyuan-7B-Instru... llama_model_loader: - kv 36: quantize.imatrix.dataset str = calibration_datav3.txt llama_model_loader: - kv 37: quantize.imatrix.entries_count u32 = 224 llama_model_loader: - kv 38: quantize.imatrix.chunks_count u32 = 130 llama_model_loader: - type f32: 129 tensors llama_model_loader: - type q4_K: 192 tensors llama_model_loader: - type q6_K: 33 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.30 GiB (4.92 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'hunyuan-dense' llama_model_load_from_file_impl: failed to load model time=2025-08-07T13:32:01.020+08:00 level=INFO source=sched.go:455 msg="NewLlamaServer failed" model=/Users/sven/.ollama/models/blobs/sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d error="unable to load model: /Users/sven/.ollama/models/blobs/sha256-e59a4dd7d94142c65e0aef5e20b0a44637467b472628dac00dfe9be23dc4d18d"
Author
Owner

@rick-github commented on GitHub (Aug 7, 2025):

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'hunyuan-dense'

The model is currently unsupported.

<!-- gh-comment-id:3163045477 --> @rick-github commented on GitHub (Aug 7, 2025): ``` llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'hunyuan-dense' ``` The model is currently unsupported.
Author
Owner

@frenchmustard commented on GitHub (Aug 15, 2025):

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'hunyuan-dense'

The model is currently unsupported.

Doesn’t llama.cpp support it? I thought the necessary commits were merged by them last month?

<!-- gh-comment-id:3190395162 --> @frenchmustard commented on GitHub (Aug 15, 2025): > ``` > llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'hunyuan-dense' > ``` > > The model is currently unsupported. Doesn’t llama.cpp support it? I thought the necessary commits were merged by them last month?
Author
Owner

@rick-github commented on GitHub (Aug 15, 2025):

llama.cpp supports it. ollama merged #11823 four hours ago, which enables ollama support.

<!-- gh-comment-id:3190399914 --> @rick-github commented on GitHub (Aug 15, 2025): llama.cpp supports it. ollama merged #11823 four hours ago, which enables ollama support.
Author
Owner

@frenchmustard commented on GitHub (Aug 15, 2025):

llama.cpp supports it. ollama merged #11823 four hours ago, which enables ollama support.

Any idea on when this update is pushed? If it will take a while ill try and see if I can get the main branch running locally right now.

<!-- gh-comment-id:3191385775 --> @frenchmustard commented on GitHub (Aug 15, 2025): > llama.cpp supports it. ollama merged [#11823](https://github.com/ollama/ollama/pull/11823) four hours ago, which enables ollama support. Any idea on when this update is pushed? If it will take a while ill try and see if I can get the main branch running locally right now.
Author
Owner

@rick-github commented on GitHub (Aug 15, 2025):

https://github.com/ollama/ollama/releases/tag/v0.11.5-rc2

However, the model doesn't have the correct template or stop parameters so while it now loads, it generates gibberish.

<!-- gh-comment-id:3191389883 --> @rick-github commented on GitHub (Aug 15, 2025): https://github.com/ollama/ollama/releases/tag/v0.11.5-rc2 However, the model doesn't have the correct template or stop parameters so while it now loads, it generates gibberish.
Author
Owner

@rick-github commented on GitHub (Aug 15, 2025):

FROM alibayram/hunyuan:0.5b
TEMPLATE """
{{- $lastUserIdx := -1 }}
{{- range $i, $_ := .Messages }}
{{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }}
{{- end -}}
<|hy_begin▁of▁sentence|>
{{- if .System }}
{{- .System }}<|hy_place▁holder▁no▁3|>
{{- end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|hy_User|>
{{- .Content }}
{{- else if eq .Role "assistant" }}<|hy_Assistant|>
{{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
<think>{{ .Thinking }}</think>
{{- end }}
{{ .Content }}<|hy_place▁holder▁no▁2|>
{{- end }}
{{- if and (ne .Role "assistant") $last }}<|hy_Assistant|>
{{- if and $.IsThinkSet (not $.Think) -}}
<think>

</think>
{{ end -}}
{{- end }}
{{- end }}"""
PARAMETER stop <|hy_begin▁of▁sentence|>
PARAMETER stop <|hy_place▁holder▁no▁2|>
PARAMETER stop <|hy_place▁holder▁no▁3|>
PARAMETER stop <|hy_User|>
PARAMETER stop <|hy_Assistant|>
$ ollama run hunyuan:0.5b hello 
Thinking...
Okay, the user said "hello". I need to respond in a friendly and welcoming way. Let me start with a greeting. 
Maybe something like "Hello there!" That's standard for any chat. Then, maybe offer further assistance. I should 
keep it simple and open-ended so they can ask more questions or have anything else. Let me check if there's 
anything else needed. Oh, right, sometimes people get off to sleep, so maybe a light touch about that? Or 
perhaps a reminder of how much time is left. But the main thing is to be welcoming. So putting it together: 
"Hello! How can I help you today?" That should work.
...done thinking.

<answer>
Hello there! How can I help you today?
</answer>
$ ollama run hunyuan:0.5b hello --think=false
<answer>
Hello! How can I assist you today?
</answer>

There doesn't seem to be a simple way to strip out the <answer> tags.

<!-- gh-comment-id:3191593997 --> @rick-github commented on GitHub (Aug 15, 2025): ``` FROM alibayram/hunyuan:0.5b TEMPLATE """ {{- $lastUserIdx := -1 }} {{- range $i, $_ := .Messages }} {{- if eq .Role "user" }}{{- $lastUserIdx = $i }}{{ end }} {{- end -}} <|hy_begin▁of▁sentence|> {{- if .System }} {{- .System }}<|hy_place▁holder▁no▁3|> {{- end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if eq .Role "user" }}<|hy_User|> {{- .Content }} {{- else if eq .Role "assistant" }}<|hy_Assistant|> {{- if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}} <think>{{ .Thinking }}</think> {{- end }} {{ .Content }}<|hy_place▁holder▁no▁2|> {{- end }} {{- if and (ne .Role "assistant") $last }}<|hy_Assistant|> {{- if and $.IsThinkSet (not $.Think) -}} <think> </think> {{ end -}} {{- end }} {{- end }}""" PARAMETER stop <|hy_begin▁of▁sentence|> PARAMETER stop <|hy_place▁holder▁no▁2|> PARAMETER stop <|hy_place▁holder▁no▁3|> PARAMETER stop <|hy_User|> PARAMETER stop <|hy_Assistant|> ``` ```console $ ollama run hunyuan:0.5b hello Thinking... Okay, the user said "hello". I need to respond in a friendly and welcoming way. Let me start with a greeting. Maybe something like "Hello there!" That's standard for any chat. Then, maybe offer further assistance. I should keep it simple and open-ended so they can ask more questions or have anything else. Let me check if there's anything else needed. Oh, right, sometimes people get off to sleep, so maybe a light touch about that? Or perhaps a reminder of how much time is left. But the main thing is to be welcoming. So putting it together: "Hello! How can I help you today?" That should work. ...done thinking. <answer> Hello there! How can I help you today? </answer> ``` ```console $ ollama run hunyuan:0.5b hello --think=false <answer> Hello! How can I assist you today? </answer> ``` There doesn't seem to be a simple way to strip out the `<answer>` tags.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7709