[GH-ISSUE #8147] mixtral:8x7b fails to run with a "missing tensor" error. #5200

Closed
opened 2026-04-12 16:19:53 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @vnicolici on GitHub (Dec 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8147

What is the issue?

When attempting to run mixtral:8x7b I got a missing tensor 'blk.0.ffn_down_exps.weight' error after it finished downloading it:

C:\Users\vladn>ollama run mixtral:8x7b
pulling manifest
pulling e9e56e8bb5f0... 100% ▕█████████████████████████▏  26 GB
pulling 43070e2d4e53... 100% ▕█████████████████████████▏  11 KB
pulling c43332387573... 100% ▕█████████████████████████▏   67 B
pulling ed11eda7790d... 100% ▕█████████████████████████▏   30 B
pulling 9dec05e9b2db... 100% ▕█████████████████████████▏  484 B
verifying sha256 digest
writing manifest
success
Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'

C:\Users\vladn>ollama -v
ollama version is 0.5.3

C:\Users\vladn>ollama run mixtral:8x7b
Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'

And in the log:

...
llm_load_print_meta: EOG token        = 2 '</s>'
llm_load_print_meta: max token length = 48
llama_model_load: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'
llama_load_model_from_file: failed to load model
panic: unable to load model: C:\Users\vladn\.ollama\models\blobs\sha256-e9e56e8bb5f0fcd4860675e6837a8f6a94e659f5fa7dce6a1076279336320f2b

goroutine 7 [running]:
github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc0000d61b0, {0xe, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc0000241d0, 0x0}, ...)
	github.com/ollama/ollama/llama/runner/runner.go:861 +0x3ad
created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1
	github.com/ollama/ollama/llama/runner/runner.go:979 +0xd0d
time=2024-12-18T01:08:09.599+02:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'"
[GIN] 2024/12/18 - 01:08:09 | 500 |    577.0646ms |       127.0.0.1 | POST     "/api/generate"

I found a similar issue in another project: https://github.com/ggerganov/llama.cpp/issues/10244

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.5.3

Originally created by @vnicolici on GitHub (Dec 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8147 ### What is the issue? When attempting to run mixtral:8x7b I got a `missing tensor 'blk.0.ffn_down_exps.weight'` error after it finished downloading it: ``` C:\Users\vladn>ollama run mixtral:8x7b pulling manifest pulling e9e56e8bb5f0... 100% ▕█████████████████████████▏ 26 GB pulling 43070e2d4e53... 100% ▕█████████████████████████▏ 11 KB pulling c43332387573... 100% ▕█████████████████████████▏ 67 B pulling ed11eda7790d... 100% ▕█████████████████████████▏ 30 B pulling 9dec05e9b2db... 100% ▕█████████████████████████▏ 484 B verifying sha256 digest writing manifest success Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight' C:\Users\vladn>ollama -v ollama version is 0.5.3 C:\Users\vladn>ollama run mixtral:8x7b Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight' ``` And in the log: ``` ... llm_load_print_meta: EOG token = 2 '</s>' llm_load_print_meta: max token length = 48 llama_model_load: error loading model: missing tensor 'blk.0.ffn_down_exps.weight' llama_load_model_from_file: failed to load model panic: unable to load model: C:\Users\vladn\.ollama\models\blobs\sha256-e9e56e8bb5f0fcd4860675e6837a8f6a94e659f5fa7dce6a1076279336320f2b goroutine 7 [running]: github.com/ollama/ollama/llama/runner.(*Server).loadModel(0xc0000d61b0, {0xe, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc0000241d0, 0x0}, ...) github.com/ollama/ollama/llama/runner/runner.go:861 +0x3ad created by github.com/ollama/ollama/llama/runner.Execute in goroutine 1 github.com/ollama/ollama/llama/runner/runner.go:979 +0xd0d time=2024-12-18T01:08:09.599+02:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'" [GIN] 2024/12/18 - 01:08:09 | 500 | 577.0646ms | 127.0.0.1 | POST "/api/generate" ``` I found a similar issue in another project: https://github.com/ggerganov/llama.cpp/issues/10244 ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.3
GiteaMirror added the bug label 2026-04-12 16:19:53 -05:00
Author
Owner

@pdevine commented on GitHub (Dec 17, 2024):

@vnicolici I believe the moe architecture ended up changing and I think we need to reconvert the weights to get this working again.

<!-- gh-comment-id:2549888665 --> @pdevine commented on GitHub (Dec 17, 2024): @vnicolici I believe the moe architecture ended up changing and I think we need to reconvert the weights to get this working again.
Author
Owner

@vnicolici commented on GitHub (Dec 17, 2024):

@pdevine OK, I'm not entirely sure that model is the one I should use anyway, I'm trying now dolphin-mixtral:8x7b instead of mixtral:8x7b, I'll let you know if it behaves differently.

<!-- gh-comment-id:2549893006 --> @vnicolici commented on GitHub (Dec 17, 2024): @pdevine OK, I'm not entirely sure that model is the one I should use anyway, I'm trying now `dolphin-mixtral:8x7b` instead of `mixtral:8x7b`, I'll let you know if it behaves differently.
Author
Owner

@vnicolici commented on GitHub (Dec 17, 2024):

Unfortunately dolphin-mixtral:8x7b behaves exactly the same, same error.

<!-- gh-comment-id:2549901240 --> @vnicolici commented on GitHub (Dec 17, 2024): Unfortunately `dolphin-mixtral:8x7b` behaves exactly the same, same error.
Author
Owner

@pdevine commented on GitHub (Dec 18, 2024):

@vnicolici it's the same problem. Really sorry about this. We're re-pushing the weights for mixtral and I'll see if we can get dolphin-mixtral repushed as well.

This happened because the llama.cpp engine changed how the MoE tensors are handled. With the new engine (i.e. w/o llama.cpp) hopefully we won't have a repeat of this in the future.

<!-- gh-comment-id:2549997703 --> @pdevine commented on GitHub (Dec 18, 2024): @vnicolici it's the same problem. Really sorry about this. We're re-pushing the weights for mixtral and I'll see if we can get `dolphin-mixtral` repushed as well. This happened because the llama.cpp engine changed how the MoE tensors are handled. With the new engine (i.e. w/o llama.cpp) hopefully we won't have a repeat of this in the future.
Author
Owner

@vnicolici commented on GitHub (Dec 18, 2024):

OK, no problem, I didn't have any urgent need, I was just playing around.

<!-- gh-comment-id:2549999411 --> @vnicolici commented on GitHub (Dec 18, 2024): OK, no problem, I didn't have any urgent need, I was just playing around.
Author
Owner

@pdevine commented on GitHub (Dec 20, 2024):

Both mixtral and dolphin-mixtral have been updated. There are still some quantizations which are still pushing, but latest should be updated on both. I'll go ahead and close out the issue. Thank you for reporting it!

<!-- gh-comment-id:2556026303 --> @pdevine commented on GitHub (Dec 20, 2024): Both `mixtral` and `dolphin-mixtral` have been updated. There are still some quantizations which are still pushing, but `latest` should be updated on both. I'll go ahead and close out the issue. Thank you for reporting it!
Author
Owner

@banalg commented on GitHub (Dec 20, 2024):

Hello,

Thanks @pdevine for having solved the mixtral:latest.

But I still have the issue with mixtral:instruct.

Ollama version : 0.5.4

user@fesfes:/# ollama -v
ollama version is 0.5.4-0-g2ddc32d-dirty

I downloaded the last model a few minutes ago (20/12/2024 07:10 AM UTC)

user@fesfes:/# ollama pull mixtral:instruct
pulling manifest
pulling e9e56e8bb5f0... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏  26 GB
pulling 43070e2d4e53... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏  11 KB
pulling c43332387573... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏   67 B
pulling ed11eda7790d... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏   30 B
pulling 9dec05e9b2db... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏  484 B
verifying sha256 digest
writing manifest
success

The model just have been updated

user@fesfes:/# ollama ls
NAME                          ID              SIZE      MODIFIED
mixtral:instruct              d39eb76ed9c5    26 GB     12 seconds ago
user@fesfes:/# ollama run mixtral:instruct
Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'
llama_load_model_from_file: failed to load model

Others informations :

  • OK with Ollama v0.5.1
  • KO with Ollama v0.5.2 (same issue : missing tensor 'blk.0.ffn_down_exps.weight')
  • KO with Ollama v0.5.3 (same issue : missing tensor 'blk.0.ffn_down_exps.weight')
<!-- gh-comment-id:2556441143 --> @banalg commented on GitHub (Dec 20, 2024): Hello, Thanks @pdevine for having solved the mixtral:latest. But I still have the issue with mixtral:instruct. ### Ollama version : 0.5.4 ### ``` user@fesfes:/# ollama -v ollama version is 0.5.4-0-g2ddc32d-dirty ``` ### I downloaded the last model a few minutes ago (20/12/2024 07:10 AM UTC) ### ``` user@fesfes:/# ollama pull mixtral:instruct pulling manifest pulling e9e56e8bb5f0... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 26 GB pulling 43070e2d4e53... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 11 KB pulling c43332387573... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 67 B pulling ed11eda7790d... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 30 B pulling 9dec05e9b2db... 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 484 B verifying sha256 digest writing manifest success ``` ### The model just have been updated ### ``` user@fesfes:/# ollama ls NAME ID SIZE MODIFIED mixtral:instruct d39eb76ed9c5 26 GB 12 seconds ago ``` ``` user@fesfes:/# ollama run mixtral:instruct Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight' llama_load_model_from_file: failed to load model ``` Others informations : - OK with Ollama v0.5.1 - KO with Ollama v0.5.2 (same issue : missing tensor 'blk.0.ffn_down_exps.weight') - KO with Ollama v0.5.3 (same issue : missing tensor 'blk.0.ffn_down_exps.weight')
Author
Owner

@pdevine commented on GitHub (Dec 20, 2024):

@banalg hey thanks for catching that. the instruct tag will just point to the same weights as latest so you can use those weights instead. We'll get the instruct tag also pushed.

<!-- gh-comment-id:2556506342 --> @pdevine commented on GitHub (Dec 20, 2024): @banalg hey thanks for catching that. the `instruct` tag will just point to the same weights as `latest` so you can use those weights instead. We'll get the `instruct` tag also pushed.
Author
Owner

@banalg commented on GitHub (Dec 20, 2024):

I pulled the last mixtral:instruct and its OK now. Thanks a lot.

<!-- gh-comment-id:2556560511 --> @banalg commented on GitHub (Dec 20, 2024): I pulled the last mixtral:instruct and its OK now. Thanks a lot.
Author
Owner

@rosingrind commented on GitHub (Jan 29, 2025):

Same issue with notux:8x7b-v1-q6_K

<!-- gh-comment-id:2622442046 --> @rosingrind commented on GitHub (Jan 29, 2025): Same issue with `notux:8x7b-v1-q6_K`
Author
Owner

@pdevine commented on GitHub (Jan 29, 2025):

@rosingrind hey good catch. I'm tempted to deprecate notux since it hasn't been updated in such a long time.

<!-- gh-comment-id:2622914704 --> @pdevine commented on GitHub (Jan 29, 2025): @rosingrind hey good catch. I'm tempted to deprecate `notux` since it hasn't been updated in such a long time.
Author
Owner

@Harry-Ja commented on GitHub (Apr 17, 2025):

@pdevine I am encountering the same issue when running chinese-mixtral-gguf with ollama. Is this the right place I am looking for? Thanks.

<!-- gh-comment-id:2812265711 --> @Harry-Ja commented on GitHub (Apr 17, 2025): @pdevine I am encountering the same issue when running `chinese-mixtral-gguf` with ollama. Is this the right place I am looking for? Thanks.
Author
Owner

@emptystack1024 commented on GitHub (Aug 20, 2025):

PS C:\Users\33398> ollama run wangrongsheng/aurora
pulling manifest
pulling 93810b60e4e2: 100% ▕██████████████████████████████████████████████████████████▏ 26 GB
pulling a47b02e00552: 100% ▕██████████████████████████████████████████████████████████▏ 106 B
pulling 6bd182a7132e: 100% ▕██████████████████████████████████████████████████████████▏ 40 B
pulling f02dd72bb242: 100% ▕██████████████████████████████████████████████████████████▏ 59 B
pulling 4ed038971897: 100% ▕██████████████████████████████████████████████████████████▏ 484 B
verifying sha256 digest
writing manifest
success
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'

I have the same issue.

<!-- gh-comment-id:3203817383 --> @emptystack1024 commented on GitHub (Aug 20, 2025): PS C:\Users\33398> ollama run wangrongsheng/aurora pulling manifest pulling 93810b60e4e2: 100% ▕██████████████████████████████████████████████████████████▏ 26 GB pulling a47b02e00552: 100% ▕██████████████████████████████████████████████████████████▏ 106 B pulling 6bd182a7132e: 100% ▕██████████████████████████████████████████████████████████▏ 40 B pulling f02dd72bb242: 100% ▕██████████████████████████████████████████████████████████▏ 59 B pulling 4ed038971897: 100% ▕██████████████████████████████████████████████████████████▏ 484 B verifying sha256 digest writing manifest success Error: 500 Internal Server Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight' I have the same issue.
Author
Owner

@pdevine commented on GitHub (Aug 20, 2025):

Unfortunately wangrongsheng/aurora isn't an official model, so I don't know exactly how it works. Looking through the tensors it seems like it has split the experts into different layers which is not what the old (llama.cpp based) engine is expecting.

<!-- gh-comment-id:3204140746 --> @pdevine commented on GitHub (Aug 20, 2025): Unfortunately `wangrongsheng/aurora` isn't an official model, so I don't know exactly how it works. Looking through the tensors it seems like it has split the experts into different layers which is not what the old (llama.cpp based) engine is expecting.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5200