[PR #10770] llm: exclude input layers from VRAM layer size estimation #13356

Closed
opened 2026-04-13 00:24:45 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/10770

State: closed
Merged: No


Input layers only run on CPU, so don't use them when finding the maximum size of a layer in VRAM.

Doesn't completely ease the increased memory reported in the following, but reduces it somewhat:

Fixes: #10765, #10756, #10752, #10726

**Original Pull Request:** https://github.com/ollama/ollama/pull/10770 **State:** closed **Merged:** No --- Input layers only run on [CPU](https://github.com/ollama/ollama/blob/7edfdd2f5f48a7be035cec23b4acd12f7c112e1c/ml/backend/ggml/ggml.go#L248), so don't use them when finding the maximum size of a layer in VRAM. Doesn't completely ease the increased memory reported in the following, but reduces it somewhat: Fixes: #10765, #10756, #10752, #10726
GiteaMirror added the pull-request label 2026-04-13 00:24:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13356