[GH-ISSUE #1304] deepseek-coder:6.7b-base-q5_K_M not working on a Mac #78351

Closed
opened 2026-05-08 22:39:53 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @olafgeibig on GitHub (Nov 28, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1304

Originally assigned to: @dhiltgen on GitHub.

Tried it 2 times. deleted model after 1st try, and then ollama pull again.

ollama run deepseek-coder:6.7b-base-q5_K_M
Error: llama runner process has terminated

It actually works if I do a ollama create with a manually downloaded model.

Ollama 0.1.12
MacOS 13.6 (22G120)
Apple M1 Pro, 16GB

Originally created by @olafgeibig on GitHub (Nov 28, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1304 Originally assigned to: @dhiltgen on GitHub. Tried it 2 times. deleted model after 1st try, and then ollama pull again. ``` ollama run deepseek-coder:6.7b-base-q5_K_M Error: llama runner process has terminated ``` It actually works if I do a ```ollama create``` with a manually downloaded model. Ollama 0.1.12 MacOS 13.6 (22G120) Apple M1 Pro, 16GB
GiteaMirror added the macosmemory labels 2026-05-08 22:39:54 -05:00
Author
Owner

@igorschlum commented on GitHub (Nov 29, 2023):

Hi @olafgeibig

I'm using Ollama 0.1.12
MacOS 13.5.2
Apple M1 Pro, 32GB

And It worked for me. It could be a memory issue. Reboot your mac and try again to see if you still have the error message.

(base) igor@macIgor ~ % ollama run deepseek-coder:6.7b-base-q5_K_M
pulling manifest
pulling 6e92e8607680... 100% ▕████▏(4.8 GB/4.8 GB)
pulling ccfee4895df0... 100% ▕████▏(13.8 KB/13.8 KB)
pulling 58e1b82a691f... 100% ▕████▏(18 B/18 B)
pulling c77ca3ce73a4... 100% ▕████▏(383 B/383 B)
verifying sha256 digest
writing manifest
removing any unused layers
success

<!-- gh-comment-id:1831018120 --> @igorschlum commented on GitHub (Nov 29, 2023): Hi @olafgeibig I'm using Ollama 0.1.12 MacOS 13.5.2 Apple M1 Pro, 32GB And It worked for me. It could be a memory issue. Reboot your mac and try again to see if you still have the error message. (base) igor@macIgor ~ % ollama run deepseek-coder:6.7b-base-q5_K_M pulling manifest pulling 6e92e8607680... 100% ▕████▏(4.8 GB/4.8 GB) pulling ccfee4895df0... 100% ▕████▏(13.8 KB/13.8 KB) pulling 58e1b82a691f... 100% ▕████▏(18 B/18 B) pulling c77ca3ce73a4... 100% ▕████▏(383 B/383 B) verifying sha256 digest writing manifest removing any unused layers success >>>
Author
Owner

@easp commented on GitHub (Nov 30, 2023):

@olafgeibig, @igorschlum is right that it is a memory issue.

That model requires 13.8GB. A 16GB Apple Silicon Mac only makes 0.66*16GB = 10.56GB available to the GPU.

You can change the RAM available to the GPU via an OS tunable:

On Ventura: sudo sysctl debug.iogpu.wired_limit=<mb>
On Sonoma: sudo iogpu.wired_limit_mb=<mb> where is the size of the memory in megabytes you want to make available to the GPU.

You need to be careful though where you set the limit. I would check how much wired memory your computer is using in activity monitor without an LLM loaded, add at least a GB or two of padding, and subtract that from your memory size (16384) to get a number to try. Also note that this setting isn't preserved across reboots.

That's probably not going to be enough though. You have two other things that you can try, alone or in combination. 1) create a custom model with a smaller context size. 2) use a q4 quantization.

<!-- gh-comment-id:1833038294 --> @easp commented on GitHub (Nov 30, 2023): @olafgeibig, @igorschlum is right that it is a memory issue. That model requires 13.8GB. A 16GB Apple Silicon Mac only makes 0.66*16GB = 10.56GB available to the GPU. You can change the RAM available to the GPU via an OS tunable: On Ventura: `sudo sysctl debug.iogpu.wired_limit=<mb>` On Sonoma: `sudo iogpu.wired_limit_mb=<mb>` where <mb> is the size of the memory in megabytes you want to make available to the GPU. You need to be careful though where you set the limit. I would check how much wired memory your computer is using in activity monitor without an LLM loaded, add at least a GB or two of padding, and subtract that from your memory size (16384) to get a number to try. Also note that this setting isn't preserved across reboots. That's probably not going to be enough though. You have two other things that you can try, alone or in combination. 1) create a custom model with a smaller context size. 2) use a q4 quantization.
Author
Owner

@dhiltgen commented on GitHub (Jul 24, 2024):

We've made numerous improvements in memory prediction, and understanding available memory splits between GPU/system on MacOS. I don't have a 16G macos system to verify this on, but I believe this should be fixed. If you're still having problems running models on your 16G system, please share your server log and I'll reopen the issue.

<!-- gh-comment-id:2248913691 --> @dhiltgen commented on GitHub (Jul 24, 2024): We've made numerous improvements in memory prediction, and understanding available memory splits between GPU/system on MacOS. I don't have a 16G macos system to verify this on, but I believe this should be fixed. If you're still having problems running models on your 16G system, please share your server log and I'll reopen the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#78351