[GH-ISSUE #15415] Total bug #35615

Open
opened 2026-04-22 20:15:28 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @DjceUo on GitHub (Apr 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15415

What is the issue?

Ollama on previous versions worked smoothly with models like GPT-OSS-20B and similar ones such as QWEN3-30B-3A with practically no CPU load, but now they cause CPU utilization near 100% through OpenWeb UI or ChatBox with context windows of 32 or 64k tokens. Restore proper Ollama functionality.

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.20.3

Originally created by @DjceUo on GitHub (Apr 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15415 ### What is the issue? Ollama on previous versions worked smoothly with models like GPT-OSS-20B and similar ones such as QWEN3-30B-3A with practically no CPU load, but now they cause CPU utilization near 100% through OpenWeb UI or ChatBox with context windows of 32 or 64k tokens. Restore proper Ollama functionality. ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.20.3
GiteaMirror added the needs more infobug labels 2026-04-22 20:15:28 -05:00
Author
Owner

@DjceUo commented on GitHub (Apr 8, 2026):

A compelling request to the developers: please test thoroughly before releasing Ollama. I know several sizable companies considering Ollama for local AI, but they're held back by release instability and unpredictability. Models that worked on a previous release can easily stop working.

<!-- gh-comment-id:4204253266 --> @DjceUo commented on GitHub (Apr 8, 2026): A compelling request to the developers: please test thoroughly before releasing Ollama. I know several sizable companies considering Ollama for local AI, but they're held back by release instability and unpredictability. Models that worked on a previous release can easily stop working.
Author
Owner

@grepin commented on GitHub (Apr 8, 2026):

@DjceUo you are tremendous! "just forget, write more tests by yourserf, or buy commercial product". it's opensource, "as is". try to understand, "feel" the ideology and keep your emotions

<!-- gh-comment-id:4204310198 --> @grepin commented on GitHub (Apr 8, 2026): @DjceUo you are tremendous! "just forget, write more tests by yourserf, or buy commercial product". it's opensource, "as is". try to understand, "feel" the ideology and keep your emotions
Author
Owner

@DjceUo commented on GitHub (Apr 8, 2026):

The flagship open-source Linux distro runs stably and predictably from release to release, excluding the crap builds.

<!-- gh-comment-id:4205390709 --> @DjceUo commented on GitHub (Apr 8, 2026): The flagship open-source Linux distro runs stably and predictably from release to release, excluding the crap builds.
Author
Owner

@rick-github commented on GitHub (Apr 8, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:4205675513 --> @rick-github commented on GitHub (Apr 8, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@DjceUo commented on GitHub (Apr 9, 2026):

I can just picture Linus Torvalds saying, 'If you don't provide memory dumps, I won't fix any bugs...' Are you seriously suggesting I spend a couple of days gathering server logs for you?

<!-- gh-comment-id:4211720588 --> @DjceUo commented on GitHub (Apr 9, 2026): I can just picture Linus Torvalds saying, 'If you don't provide memory dumps, I won't fix any bugs...' Are you seriously suggesting I spend a couple of days gathering server logs for you?
Author
Owner

@rick-github commented on GitHub (Apr 9, 2026):

It will make it easier to resolve the issue. For example, the logs you provided in #14636 showed why the model load failed, and in #14849 we found that you had insufficient free VRAM. Since the issue you are having seems to be with external clients, logs will provide the basis for determining where the cause of the problem is. In the process of debugging the issue, logs from those clients may also be required.

<!-- gh-comment-id:4212503555 --> @rick-github commented on GitHub (Apr 9, 2026): It will make it easier to resolve the issue. For example, the logs you provided in #14636 showed why the model load failed, and in #14849 we found that you had insufficient free VRAM. Since the issue you are having seems to be with external clients, logs will provide the basis for determining where the cause of the problem is. In the process of debugging the issue, logs from those clients may also be required.
Author
Owner

@DjceUo commented on GitHub (Apr 10, 2026):

I did a quick investigation and found the following:
When using Ollama via the command line or its native GUI, everything works almost flawlessly—even when VRAM is insufficient for the context window. In such cases, Ollama seamlessly offloads to system RAM.
However, when trying to run models that require more context than fits in VRAM (thus needing RAM offloading) through ChatBox or OpenWebUI, Ollama essentially fails in these scenarios. This issue wasn't present in earlier versions. I haven't dug into the code yet.

<!-- gh-comment-id:4221945659 --> @DjceUo commented on GitHub (Apr 10, 2026): I did a quick investigation and found the following: When using Ollama via the command line or its native GUI, everything works almost flawlessly—even when VRAM is insufficient for the context window. In such cases, Ollama seamlessly offloads to system RAM. However, when trying to run models that require more context than fits in VRAM (thus needing RAM offloading) through ChatBox or OpenWebUI, Ollama essentially fails in these scenarios. This issue wasn't present in earlier versions. I haven't dug into the code yet.
Author
Owner

@DjceUo commented on GitHub (Apr 10, 2026):

I can't say for sure why the new models aren't working or are only partially functional. If I feel like it, I'll dig into the code over the weekend.

<!-- gh-comment-id:4221955658 --> @DjceUo commented on GitHub (Apr 10, 2026): I can't say for sure why the new models aren't working or are only partially functional. If I feel like it, I'll dig into the code over the weekend.
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15415
Analyzed: 2026-04-18T18:21:03.027411

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274307633 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15415 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15415 **Analyzed**: 2026-04-18T18:21:03.027411 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35615