[GH-ISSUE #3997] ollama can't support run wizardlm2:8x22b #2475

Closed
opened 2026-04-12 12:48:25 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @lizhanyang505 on GitHub (Apr 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3997

What is the issue?

ollama run wizardlm2:8x22b
Error: llama runner process no longer running: 1 error:failed to create context with model '/mnt/data1/ollama/models/blobs/sha256-cfcf93119280c4a10c1df57335bad341e000cabbc4faff125531d941a5b0befa'

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:18:00.0 Off | Off |
| 30% 30C P8 20W / 450W | 47MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:51:00.0 Off | Off |
| 30% 27C P8 20W / 450W | 17MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

No response

Originally created by @lizhanyang505 on GitHub (Apr 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3997 ### What is the issue? ollama run wizardlm2:8x22b Error: llama runner process no longer running: 1 error:failed to create context with model '/mnt/data1/ollama/models/blobs/sha256-cfcf93119280c4a10c1df57335bad341e000cabbc4faff125531d941a5b0befa' +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:18:00.0 Off | Off | | 30% 30C P8 20W / 450W | 47MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 Off | 00000000:51:00.0 Off | Off | | 30% 27C P8 20W / 450W | 17MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 12:48:25 -05:00
Author
Owner

@helium729 commented on GitHub (Apr 28, 2024):

This happened to me when my VRAM is not sufficient, can you run mixtral 8x22b? or maybe you can monitor the VRAM usage while loading this model.

<!-- gh-comment-id:2081505494 --> @helium729 commented on GitHub (Apr 28, 2024): This happened to me when my VRAM is not sufficient, can you run mixtral 8x22b? or maybe you can monitor the VRAM usage while loading this model.
Author
Owner

@itsXactlY commented on GitHub (Apr 28, 2024):

Did run here on my good old 1070, make sure all is up2date on your side.

<!-- gh-comment-id:2081622551 --> @itsXactlY commented on GitHub (Apr 28, 2024): Did run here on my good old 1070, make sure all is up2date on your side.
Author
Owner

@BruceMacD commented on GitHub (May 1, 2024):

Hi @lizhanyang505, this could have been related to a bug in our calculation of the size needed to run mixtral models (#3836). If that is the case the next release may fix your issue.

Where the context is mentioned in the error log here it might also be worth configuring the context size you're loading the model, if you have changed that previously.

<!-- gh-comment-id:2089078682 --> @BruceMacD commented on GitHub (May 1, 2024): Hi @lizhanyang505, this could have been related to a bug in our calculation of the size needed to run mixtral models (#3836). If that is the case the next release may fix your issue. Where the context is mentioned in the error log here it might also be worth configuring the context size you're loading the model, if you have changed that previously.
Author
Owner

@lizhanyang505 commented on GitHub (May 6, 2024):

Did run here on my good old 1070, make sure all is up2date on your side.
The ollama version is the latest 1.3.2. It doesn’t work.

<!-- gh-comment-id:2095568654 --> @lizhanyang505 commented on GitHub (May 6, 2024): > Did run here on my good old 1070, make sure all is up2date on your side. The ollama version is the latest 1.3.2. It doesn’t work.
Author
Owner

@lizhanyang505 commented on GitHub (May 6, 2024):

Hi @lizhanyang505, this could have been related to a bug in our calculation of the size needed to run mixtral models (#3836). If that is the case the next release may fix your issue.您好,这可能与我们计算运行混合模型所需的大小 ( #3836 ) 时的错误有关。如果是这种情况,下一个版本可能会解决您的问题。

Where the context is mentioned in the error log here it might also be worth configuring the context size you're loading the model, if you have changed that previously.
image
Ollama v0.1.33 is support ?

<!-- gh-comment-id:2095571467 --> @lizhanyang505 commented on GitHub (May 6, 2024): > Hi @lizhanyang505, this could have been related to a bug in our calculation of the size needed to run mixtral models (#3836). If that is the case the next release may fix your issue.您好,这可能与我们计算运行混合模型所需的大小 ( #3836 ) 时的错误有关。如果是这种情况,下一个版本可能会解决您的问题。 > > Where the context is mentioned in the error log here it might also be worth configuring the context size you're loading the model, if you have changed that previously. ![image](https://github.com/ollama/ollama/assets/49928490/fe123617-2874-470d-a82b-2ef0f2764c92) Ollama v0.1.33 is support ?
Author
Owner

@BruceMacD commented on GitHub (May 6, 2024):

@lizhanyang505 it should work in v0.1.33 if you have enough VRAM:

❯ ollama run wizardlm2:8x22b
>>> hi
 Hello! How can I assist you today? If you have any questions or need information on a particular topic, feel free to ask. I'm here to help!

❯ ollama --version
ollama version is 0.1.33

Please let me know if the issue has resolved for you.

<!-- gh-comment-id:2096980125 --> @BruceMacD commented on GitHub (May 6, 2024): @lizhanyang505 it should work in v0.1.33 if you have enough VRAM: ``` ❯ ollama run wizardlm2:8x22b >>> hi Hello! How can I assist you today? If you have any questions or need information on a particular topic, feel free to ask. I'm here to help! ❯ ollama --version ollama version is 0.1.33 ``` Please let me know if the issue has resolved for you.
Author
Owner

@jmorganca commented on GitHub (May 9, 2024):

This should be solved in 0.1.33. Let me know if that's not the case!

<!-- gh-comment-id:2103461431 --> @jmorganca commented on GitHub (May 9, 2024): This should be solved in 0.1.33. Let me know if that's not the case!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2475