[GH-ISSUE #3691] why wizard 8*22b run on CPU? #48785

Closed
opened 2026-04-28 09:16:02 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @taozhiyuai on GitHub (Apr 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3691

What is the issue?

截屏2024-04-17 12 23 02

as shown in the picture. it run on CPU.

my laptop is Mac m3 max 128GB. LM Studio can not run wizardlm-2-8*22b-Q8, it will OOM.

but ollama can run it. but on CPU it seems. is that possible to run on GPU?

What did you expect to see?

No response

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

macOS

Architecture

No response

Platform

No response

Ollama version

up to date

GPU

Apple

GPU info

No response

CPU

Apple

Other software

No response

Originally created by @taozhiyuai on GitHub (Apr 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3691 ### What is the issue? <img width="1148" alt="截屏2024-04-17 12 23 02" src="https://github.com/ollama/ollama/assets/146583103/4121905b-51b5-4435-a323-ea773e2015a8"> as shown in the picture. it run on CPU. my laptop is Mac m3 max 128GB. LM Studio can not run wizardlm-2-8*22b-Q8, it will OOM. but ollama can run it. but on CPU it seems. is that possible to run on GPU? ### What did you expect to see? _No response_ ### Steps to reproduce _No response_ ### Are there any recent changes that introduced the issue? _No response_ ### OS macOS ### Architecture _No response_ ### Platform _No response_ ### Ollama version up to date ### GPU Apple ### GPU info _No response_ ### CPU Apple ### Other software _No response_
GiteaMirror added the bug label 2026-04-28 09:16:02 -05:00
Author
Owner

@taozhiyuai commented on GitHub (Apr 17, 2024):

it seems CPU work on SSD as V-RAM

<!-- gh-comment-id:2060353061 --> @taozhiyuai commented on GitHub (Apr 17, 2024): it seems CPU work on SSD as V-RAM
Author
Owner

@zhaopengme commented on GitHub (Apr 17, 2024):

是m芯片吗,你安装一下asitop,就能看出来了。
image

<!-- gh-comment-id:2060839155 --> @zhaopengme commented on GitHub (Apr 17, 2024): 是m芯片吗,你安装一下asitop,就能看出来了。 ![image](https://github.com/ollama/ollama/assets/1415418/ef03408b-1be4-473d-b325-d19792e1c850)
Author
Owner

@mchiang0610 commented on GitHub (Apr 17, 2024):

@taozhiyuai 128GB is not enough memory to run the 8 bit quantization (q8) of the wizardlm2 model.

In this case, Ollama will use your computer's CPU and use SSD as the swap disk. We've tried using offloading (partially) to GPU (metal), and it would cause the computer to freeze.

We'll have to improve the scheduling on this in the future.

Hope this helps! Closing this issue if it has solved the issue. If not, please feel free to re-open!

Thank you

<!-- gh-comment-id:2062255495 --> @mchiang0610 commented on GitHub (Apr 17, 2024): @taozhiyuai 128GB is not enough memory to run the 8 bit quantization (q8) of the wizardlm2 model. In this case, Ollama will use your computer's CPU and use SSD as the swap disk. We've tried using offloading (partially) to GPU (metal), and it would cause the computer to freeze. We'll have to improve the scheduling on this in the future. Hope this helps! Closing this issue if it has solved the issue. If not, please feel free to re-open! Thank you
Author
Owner

@mxyng commented on GitHub (May 2, 2024):

Related #4028

<!-- gh-comment-id:2091530336 --> @mxyng commented on GitHub (May 2, 2024): Related #4028
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48785