[GH-ISSUE #1576] 70b model not working on apple silicon #26626

Closed
opened 2026-04-22 03:00:34 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @leejw51 on GitHub (Dec 18, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1576

memory is 48 GB
pulling model is fine
ollama run llama2:70b

error is

Error: llama runner process has terminated
Originally created by @leejw51 on GitHub (Dec 18, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1576 memory is 48 GB pulling model is fine ollama run llama2:70b error is ``` Error: llama runner process has terminated ```
Author
Owner

@jmorganca commented on GitHub (Dec 18, 2023):

Hi @leejw51 , sorry to hear it didn't work. Are there any logs that could help debug in ~/.ollama/logs/server.log? Thanks so much!

<!-- gh-comment-id:1859409393 --> @jmorganca commented on GitHub (Dec 18, 2023): Hi @leejw51 , sorry to hear it didn't work. Are there any logs that could help debug in `~/.ollama/logs/server.log`? Thanks so much!
Author
Owner

@3zero2 commented on GitHub (Dec 19, 2023):

Not OP, but I'm getting the same error when running dolphin-mixtral on a 32Gb RAM M1. I assumed it's due to to lack of RAM in my case.

Here is the log.

llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB llama_build_graph: non-view tensors processed: 1124/1124 ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Pro ggml_metal_init: picking default device: Apple M1 Pro ggml_metal_init: default.metallib not found, loading from source ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil ggml_metal_init: loading '/var/folders/mc/nvm2tk512k733fplrp16wzbr0000gn/T/ollama546028441/llama.cpp/gguf/build/metal/bin/ggml-metal.metal' ggml_metal_init: GPU name: Apple M1 Pro ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB ggml_metal_init: maxTransferRate = built-in GPU llama_new_context_with_model: compute buffer total size = 319.35 MiB llama_new_context_with_model: max tensor size = 102.55 MiB ggml_metal_add_buffer: allocated 'data ' buffer, size = 16384.00 MiB, offs = 0 ggml_metal_add_buffer: allocated 'data ' buffer, size = 8935.19 MiB, offs = 17072324608, (25320.81 / 21845.34)ggml_metal_add_buffer: warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'kv ' buffer, size = 512.03 MiB, (25832.84 / 21845.34)ggml_metal_add_buffer: warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 316.05 MiB, (26148.89 / 21845.34)ggml_metal_add_buffer: warning: current allocated size is greater than the recommended max working set size ggml_metal_graph_compute: command buffer 4 failed with status 5 GGML_ASSERT: /Users/jmorgan/workspace/ollama/llm/llama.cpp/gguf/ggml-metal.m:2353: false 2023/12/19 12:47:59 llama.go:451: signal: abort trap 2023/12/19 12:47:59 llama.go:459: error starting llama runner: llama runner process has terminated 2023/12/19 12:47:59 llama.go:525: llama runner stopped successfully [GIN] 2023/12/19 - 12:47:59 | 500 | 32.871049s | 127.0.0.1 | POST "/api/generate"

<!-- gh-comment-id:1862646311 --> @3zero2 commented on GitHub (Dec 19, 2023): Not OP, but I'm getting the same error when running dolphin-mixtral on a 32Gb RAM M1. I assumed it's due to to lack of RAM in my case. Here is the log. `llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB llama_build_graph: non-view tensors processed: 1124/1124 ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Pro ggml_metal_init: picking default device: Apple M1 Pro ggml_metal_init: default.metallib not found, loading from source ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil ggml_metal_init: loading '/var/folders/mc/nvm2tk512k733fplrp16wzbr0000gn/T/ollama546028441/llama.cpp/gguf/build/metal/bin/ggml-metal.metal' ggml_metal_init: GPU name: Apple M1 Pro ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB ggml_metal_init: maxTransferRate = built-in GPU llama_new_context_with_model: compute buffer total size = 319.35 MiB llama_new_context_with_model: max tensor size = 102.55 MiB ggml_metal_add_buffer: allocated 'data ' buffer, size = 16384.00 MiB, offs = 0 ggml_metal_add_buffer: allocated 'data ' buffer, size = 8935.19 MiB, offs = 17072324608, (25320.81 / 21845.34)ggml_metal_add_buffer: warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'kv ' buffer, size = 512.03 MiB, (25832.84 / 21845.34)ggml_metal_add_buffer: warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 316.05 MiB, (26148.89 / 21845.34)ggml_metal_add_buffer: warning: current allocated size is greater than the recommended max working set size ggml_metal_graph_compute: command buffer 4 failed with status 5 GGML_ASSERT: /Users/jmorgan/workspace/ollama/llm/llama.cpp/gguf/ggml-metal.m:2353: false 2023/12/19 12:47:59 llama.go:451: signal: abort trap 2023/12/19 12:47:59 llama.go:459: error starting llama runner: llama runner process has terminated 2023/12/19 12:47:59 llama.go:525: llama runner stopped successfully [GIN] 2023/12/19 - 12:47:59 | 500 | 32.871049s | 127.0.0.1 | POST "/api/generate" `
Author
Owner

@easp commented on GitHub (Dec 19, 2023):

@leejw51 If your Mac has 48GB of RAM then I think OS is making 36GB available to GPU. The model weights for that tag take up 39GB on there own and there is some additional overhead.

You can change the amount of RAM the OS makes available to the GPU or use a smaller quantization, like llama2:70b-chat-q3_K_S or llama2:70b-chat-q3_K_M. Or you can try a combination of those approaches.

<!-- gh-comment-id:1863253049 --> @easp commented on GitHub (Dec 19, 2023): @leejw51 If your Mac has 48GB of RAM then I think OS is making 36GB available to GPU. The model weights for that tag take up 39GB on there own and there is some additional overhead. You can [change the amount of RAM the OS makes available to the GPU](https://techobsessed.net/2023/12/increasing-ram-available-to-gpu-on-apple-silicon-macs-for-running-large-language-models/) or use a smaller quantization, like llama2:70b-chat-q3_K_S or llama2:70b-chat-q3_K_M. Or you can try a combination of those approaches.
Author
Owner

@igorschlum commented on GitHub (Dec 20, 2023):

I"m on Apple Silicon with enough memory and the Llama70b modèle loads and works well.

(base) igor@MacStudiodeIgor ~ % ollama run llama2:70b
pulling manifest
pulling 153664158022... 100% ▕████████████████▏ 38 GB
pulling 8c17c2ebb0ea... 100% ▕████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕████████████████▏ 59 B
pulling 9fa96ed79547... 100% ▕████████████████▏ 117 B
pulling 3c71be8cebca... 100% ▕████████████████▏ 530 B
verifying sha256 digest
writing manifest
removing any unused layers
success

why the sky is blue?

The sky appears blue because of a phenomenon called Rayleigh scattering,
which is the scattering of light or other electromagnetic radiation by
small particles in the atmosphere. The blue color we see in the sky is a
result of this scattering process.

When sunlight enters Earth's atmosphere, it encounters tiny molecules of
gases such as nitrogen and oxygen. These molecules scatter the light in
all directions, but they scatter shorter (blue) wavelengths more than
longer (red) wavelengths. This is known as Rayleigh scattering.

As a result of this scattering, the blue light is distributed throughout
the atmosphere, making the sky appear blue from our perspective. The color
we see in the sky can also be affected by other factors such as pollution,
dust, and water vapor, but the primary reason for the blue color of the
sky is Rayleigh scattering.

It's worth noting that the color of the sky can vary depending on the time
of day and atmospheric conditions. For example, during sunrise and sunset,
the sky can take on hues of red, orange, and pink due to the scattering of
light by atmospheric particles at these times. However, under normal
conditions, the blue color of the sky is a result of Rayleigh scattering.

<!-- gh-comment-id:1863960027 --> @igorschlum commented on GitHub (Dec 20, 2023): I"m on Apple Silicon with enough memory and the Llama70b modèle loads and works well. (base) igor@MacStudiodeIgor ~ % ollama run llama2:70b pulling manifest pulling 153664158022... 100% ▕████████████████▏ 38 GB pulling 8c17c2ebb0ea... 100% ▕████████████████▏ 7.0 KB pulling 7c23fb36d801... 100% ▕████████████████▏ 4.8 KB pulling 2e0493f67d0c... 100% ▕████████████████▏ 59 B pulling 9fa96ed79547... 100% ▕████████████████▏ 117 B pulling 3c71be8cebca... 100% ▕████████████████▏ 530 B verifying sha256 digest writing manifest removing any unused layers success >>> why the sky is blue? The sky appears blue because of a phenomenon called Rayleigh scattering, which is the scattering of light or other electromagnetic radiation by small particles in the atmosphere. The blue color we see in the sky is a result of this scattering process. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is known as Rayleigh scattering. As a result of this scattering, the blue light is distributed throughout the atmosphere, making the sky appear blue from our perspective. The color we see in the sky can also be affected by other factors such as pollution, dust, and water vapor, but the primary reason for the blue color of the sky is Rayleigh scattering. It's worth noting that the color of the sky can vary depending on the time of day and atmospheric conditions. For example, during sunrise and sunset, the sky can take on hues of red, orange, and pink due to the scattering of light by atmospheric particles at these times. However, under normal conditions, the blue color of the sky is a result of Rayleigh scattering. >>>
Author
Owner

@igorschlum commented on GitHub (Dec 20, 2023):

@easp I think you can close this Issue as clarifying "Out of Memory" error is already reported here https://github.com/jmorganca/ollama/issues/1516

<!-- gh-comment-id:1863968250 --> @igorschlum commented on GitHub (Dec 20, 2023): @easp I think you can close this Issue as clarifying "Out of Memory" error is already reported here https://github.com/jmorganca/ollama/issues/1516
Author
Owner

@easp commented on GitHub (Dec 21, 2023):

@igorschlum I'm just an ordinary user, I can't close other people's issues.

@leejw51 could close their issue, though.

<!-- gh-comment-id:1865300167 --> @easp commented on GitHub (Dec 21, 2023): @igorschlum I'm just an ordinary user, I can't close other people's issues. @leejw51 could close their issue, though.
Author
Owner

@igorschlum commented on GitHub (Dec 21, 2023):

@easp sorry the message was for @leejw51

<!-- gh-comment-id:1865785128 --> @igorschlum commented on GitHub (Dec 21, 2023): @easp sorry the message was for @leejw51
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26626