[GH-ISSUE #13152] GPT-OSS: 120B doesnt share between CPU/GPU @ CTX over 8192 #8698

Open
opened 2026-04-12 21:28:20 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @Stef1519 on GitHub (Nov 19, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13152

What is the issue?

OS: Ubuntu
Ollama Version: 0.12.11
Hardware: Xeon E5 2697 V2, 2 CPUs, 128 GB RAM, 3xP106-090, 1xP106-100, Tesla M10 32 GB, Total 56 GB GPU Memory

CLI & OpenWebUI

Issue: #1

GPT-OSS:120B happily shares between CPU/GPU when using a CTX Size of 8192 (altrough nearly 50:50 wouldn't be very valid either, considering 56 GB GPU memory):

num_ctx = 8192:
Gpt-oss:120b a951a23b46a1 67 GB 48%/52% CPU/GPU 8192 4 minutes from now
Works somehow, responses not very fast, but within one Minute.

num_ctx=32768:
gpt-oss:120b a951a23b46a1 66 GB 100% CPU 32768 4 minutes from now

num_ctx=32768 and num_gpu=16:
gpt-oss:120b a951a23b46a1 68 GB 56%/44% CPU/GPU 32768 4 minutes from now
No response for ages (probably not at all, was giving up after ten minutes)

Issue #2:

GPT-OSS:20B: Takes ages to initially load, easily five minutes or more (occupying 14 GB GPU memory)

Any clues?
Thank you very much!

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @Stef1519 on GitHub (Nov 19, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13152 ### What is the issue? OS: Ubuntu Ollama Version: 0.12.11 Hardware: Xeon E5 2697 V2, 2 CPUs, 128 GB RAM, 3xP106-090, 1xP106-100, Tesla M10 32 GB, Total 56 GB GPU Memory CLI & OpenWebUI Issue: #1 GPT-OSS:120B happily shares between CPU/GPU when using a CTX Size of 8192 (altrough nearly 50:50 wouldn't be very valid either, considering 56 GB GPU memory): num_ctx = 8192: Gpt-oss:120b a951a23b46a1 67 GB 48%/52% CPU/GPU 8192 4 minutes from now Works somehow, responses not very fast, but within one Minute. num_ctx=32768: gpt-oss:120b a951a23b46a1 66 GB 100% CPU 32768 4 minutes from now num_ctx=32768 and num_gpu=16: gpt-oss:120b a951a23b46a1 68 GB 56%/44% CPU/GPU 32768 4 minutes from now No response for ages (probably not at all, was giving up after ten minutes) Issue #2: GPT-OSS:20B: Takes ages to initially load, easily five minutes or more (occupying 14 GB GPU memory) Any clues? Thank you very much! ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the needs more infobug labels 2026-04-12 21:28:20 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 19, 2025):

Server log will show details of layer allocation.

<!-- gh-comment-id:3553630775 --> @rick-github commented on GitHub (Nov 19, 2025): [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) will show details of layer allocation.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8698