[PR #11278] mimic logs for layers on new engine #13496

Closed
opened 2026-04-13 00:28:54 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/11278

State: closed
Merged: Yes


This adds some extra logs to make the new engine a bit more consistent with the llama engine.

I opted not to change the 2 existing lines showing buffer sizes since they're close enough.

Example from llama engine:

load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   308.23 MiB
load_tensors: Metal_Mapped model buffer size =  1918.36 MiB

With this change, same model on the new engine

time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:362 msg="offloading 28 repeating layers to GPU"
time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:368 msg="offloading output layer to GPU"
time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:378 msg="offloaded 29/29 layers to GPU"
time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:380 msg="model weights" buffer=Metal size="1.9 GiB"
time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:380 msg="model weights" buffer=CPU size="308.2 MiB"
**Original Pull Request:** https://github.com/ollama/ollama/pull/11278 **State:** closed **Merged:** Yes --- This adds some extra logs to make the new engine a bit more consistent with the llama engine. I opted not to change the 2 existing lines showing buffer sizes since they're close enough. Example from llama engine: ``` load_tensors: offloading 28 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 29/29 layers to GPU load_tensors: CPU_Mapped model buffer size = 308.23 MiB load_tensors: Metal_Mapped model buffer size = 1918.36 MiB ``` With this change, same model on the new engine ``` time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:362 msg="offloading 28 repeating layers to GPU" time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:368 msg="offloading output layer to GPU" time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:378 msg="offloaded 29/29 layers to GPU" time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:380 msg="model weights" buffer=Metal size="1.9 GiB" time=2025-07-02T16:16:54.727-07:00 level=INFO source=ggml.go:380 msg="model weights" buffer=CPU size="308.2 MiB" ```
GiteaMirror added the pull-request label 2026-04-13 00:28:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13496