[GH-ISSUE #1231] ollama run llama2 on m1 macbook fails after fresh install #624

Closed
opened 2026-04-12 10:19:39 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @johnlarkin1 on GitHub (Nov 21, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1231

Hello! I am getting the following issue after I've downloaded the desktop application and tried to do the following:

╰─➤  ollama run llama2
Error: llama runner process has terminated

It also seemingly borks my computer for a second, and I'm not even able to use my trackpad (probably due to personal memory constraints).

I can upload portions of my server.log upon request. Would love any help / workaround

╰─➤  tail -n 25 ~/.ollama/logs/server.log
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32                0x1206d5bd0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_rope_f32                       0x1206d6370 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rope_f16                       0x1206d6b60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_alibi_f32                      0x1206d73d0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x1206d7f50 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x1206d8ad0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x1206d9650 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_concat                         0x1206d9d30 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sqr                            0x1206da610 | th_max = 1024 | th_width =   32
ggml_metal_init: GPU name:   Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  5461.34 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 294.13 MB
llama_new_context_with_model: max tensor size =   102.54 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  3648.58 MB, ( 3649.08 /  5461.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =  2048.02 MB, ( 5697.09 /  5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =   288.02 MB, ( 5985.11 /  5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: /Users/jmorgan/workspace/ollama/llm/llama.cpp/gguf/ggml-metal.m:1508: false
2023/11/21 18:14:57 llama.go:435: signal: abort trap
2023/11/21 18:14:57 llama.go:443: error starting llama runner: llama runner process has terminated
2023/11/21 18:14:57 llama.go:509: llama runner stopped successfully
[GIN] 2023/11/21 - 18:14:57 | 500 |  6.678189916s |       127.0.0.1 | POST     "/api/generate"

Also other version details:

╰─➤  ollama -v
ollama version 0.1.11
Originally created by @johnlarkin1 on GitHub (Nov 21, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1231 Hello! I am getting the following issue after I've downloaded the desktop application and tried to do the following: ``` ╰─➤ ollama run llama2 Error: llama runner process has terminated ``` It also seemingly borks my computer for a second, and I'm not even able to use my trackpad (probably due to personal memory constraints). I can upload portions of my `server.log` upon request. Would love any help / workaround ``` ╰─➤ tail -n 25 ~/.ollama/logs/server.log ggml_metal_init: loaded kernel_mul_mm_q6_K_f32 0x1206d5bd0 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_rope_f32 0x1206d6370 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_rope_f16 0x1206d6b60 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_alibi_f32 0x1206d73d0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_cpy_f32_f16 0x1206d7f50 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_cpy_f32_f32 0x1206d8ad0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_cpy_f16_f16 0x1206d9650 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_concat 0x1206d9d30 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_sqr 0x1206da610 | th_max = 1024 | th_width = 32 ggml_metal_init: GPU name: Apple M1 ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 5461.34 MB ggml_metal_init: maxTransferRate = built-in GPU llama_new_context_with_model: compute buffer total size = 294.13 MB llama_new_context_with_model: max tensor size = 102.54 MB ggml_metal_add_buffer: allocated 'data ' buffer, size = 3648.58 MB, ( 3649.08 / 5461.34) ggml_metal_add_buffer: allocated 'kv ' buffer, size = 2048.02 MB, ( 5697.09 / 5461.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 288.02 MB, ( 5985.11 / 5461.34), warning: current allocated size is greater than the recommended max working set size ggml_metal_graph_compute: command buffer 0 failed with status 5 GGML_ASSERT: /Users/jmorgan/workspace/ollama/llm/llama.cpp/gguf/ggml-metal.m:1508: false 2023/11/21 18:14:57 llama.go:435: signal: abort trap 2023/11/21 18:14:57 llama.go:443: error starting llama runner: llama runner process has terminated 2023/11/21 18:14:57 llama.go:509: llama runner stopped successfully [GIN] 2023/11/21 - 18:14:57 | 500 | 6.678189916s | 127.0.0.1 | POST "/api/generate" ``` Also other version details: ``` ╰─➤ ollama -v ollama version 0.1.11 ```
Author
Owner

@BruceMacD commented on GitHub (Nov 22, 2023):

Hi @johnlarkin1, thanks for sharing the logs. You can see the issue here: 5985.11 / 5461.34), warning: current allocated size is greater than the recommended max working set size. The model you are trying to run is too large for the memory currently available on your system.

<!-- gh-comment-id:1823015689 --> @BruceMacD commented on GitHub (Nov 22, 2023): Hi @johnlarkin1, thanks for sharing the logs. You can see the issue here: ` 5985.11 / 5461.34), warning: current allocated size is greater than the recommended max working set size`. The model you are trying to run is too large for the memory currently available on your system.
Author
Owner

@johnlarkin1 commented on GitHub (Nov 22, 2023):

Totally tracks 😅 thanks so much for the info (sorry I missed that one).

<!-- gh-comment-id:1823075556 --> @johnlarkin1 commented on GitHub (Nov 22, 2023): Totally tracks 😅 thanks so much for the info (sorry I missed that one).
Author
Owner

@BruceMacD commented on GitHub (Nov 22, 2023):

No worries, I'm going to think of a way to communicate these errors better. It's not obvious right now.

<!-- gh-comment-id:1823077563 --> @BruceMacD commented on GitHub (Nov 22, 2023): No worries, I'm going to think of a way to communicate these errors better. It's not obvious right now.
Author
Owner

@MaxKlyukin commented on GitHub (Nov 22, 2023):

I've also encountered similar problem, after last update newly downloaded models (orca2 and deepseek-coder) failed to launch, while previously downloaded models with same or even bigger size work fine. The problem does not seem to be (at least fully) connected with the warning, since some models run fine while logging this warning, but others do not.

<!-- gh-comment-id:1823501773 --> @MaxKlyukin commented on GitHub (Nov 22, 2023): I've also encountered similar problem, after last update newly downloaded models (orca2 and deepseek-coder) failed to launch, while previously downloaded models with same or even bigger size work fine. The problem does not seem to be (at least fully) connected with the warning, since some models run fine while logging this warning, but others do not.
Author
Owner

@FrankSandqvist commented on GitHub (Dec 14, 2023):

Running into this as well, but logically inconsistent. Able to run Q4 model but not Q2 model of 70B llama.

https://github.com/facebookresearch/llama/issues/964

<!-- gh-comment-id:1855600636 --> @FrankSandqvist commented on GitHub (Dec 14, 2023): Running into this as well, but logically inconsistent. Able to run Q4 model but not Q2 model of 70B llama. https://github.com/facebookresearch/llama/issues/964
Author
Owner

@pdevine commented on GitHub (Jan 25, 2024):

@johnlarkin1 have you tried this lately? There have been a lot of updates since your original post.

<!-- gh-comment-id:1911052815 --> @pdevine commented on GitHub (Jan 25, 2024): @johnlarkin1 have you tried this lately? There have been a lot of updates since your original post.
Author
Owner

@austinnguyen89 commented on GitHub (Feb 17, 2024):

Still not working

<!-- gh-comment-id:1949643810 --> @austinnguyen89 commented on GitHub (Feb 17, 2024): Still not working
Author
Owner

@jmorganca commented on GitHub (Feb 20, 2024):

This should be fixed as of a release in late December – @austinnguyen89 are you hitting a similar error in the logs? Let me know if so and we can re-open this

<!-- gh-comment-id:1953336942 --> @jmorganca commented on GitHub (Feb 20, 2024): This should be fixed as of a release in late December – @austinnguyen89 are you hitting a similar error in the logs? Let me know if so and we can re-open this
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#624