[GH-ISSUE #15797] macOS App: web process using over 100% CPU during thinking #72124

Open
opened 2026-05-05 03:30:53 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @xmddmx on GitHub (Apr 24, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15797

What is the issue?

When running a model that has extensive "thinking" output, the Ollama Web Process is using 100% CPU.
This didn't use to happen on older versions.

A sample of the process (attached) suggests to me that the UI is thrashing, could it be refreshing the entire view after every single character of output?

To reproduce:

  • on macOS 15.7.5
  • Ollama latest (0.21.2)
  • run a model and prompt which generates a ton of thinking. I'm using
    qwen3.6:35b-a3b-nvfp4 with this prompt: "Puzzle game: there are 16 words, grouped into 4 categories, which can include semantic, phonetic, or more tricky (such as adding or removing a letter or word). The 4 categories vary in difficulty, from obvious to obscure. Please tell me the 4 words in each category, and define the category, and list the categories in order from easiest to hardest. Here are the 16 words: calliope superiority ringmaster atlas oedipus buzzard echo electra trace dialect inferiority dictionary thesaurus reminder encyclopedia vestige"
  • as the model thinks, watch CPU usage of the web process (which will look like this: http://127.0.0.1:xxxxx in Activity monitor

Expected result: there should be some CPU usage to refresh the window, but it should not be 100%
Actual result: 100% or more. You will also notice other performance issues, such as dragging the ollama window around is slow and chunky.

Sample of Ollama Web Content.txt

Regression testing: I feel like this was not as bad as few versions ago, but I'm not sure exactly where this issue came in.

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.21.2

Originally created by @xmddmx on GitHub (Apr 24, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15797 ### What is the issue? When running a model that has extensive "thinking" output, the Ollama Web Process is using 100% CPU. This didn't use to happen on older versions. A sample of the process (attached) suggests to me that the UI is thrashing, could it be refreshing the entire view after every single character of output? To reproduce: * on macOS 15.7.5 * Ollama latest (0.21.2) * run a model and prompt which generates a ton of thinking. I'm using qwen3.6:35b-a3b-nvfp4 with this prompt: "Puzzle game: there are 16 words, grouped into 4 categories, which can include semantic, phonetic, or more tricky (such as adding or removing a letter or word). The 4 categories vary in difficulty, from obvious to obscure. Please tell me the 4 words in each category, and define the category, and list the categories in order from easiest to hardest. Here are the 16 words: calliope superiority ringmaster atlas oedipus buzzard echo electra trace dialect inferiority dictionary thesaurus reminder encyclopedia vestige" * as the model thinks, watch CPU usage of the web process (which will look like this: `http://127.0.0.1:xxxxx` in Activity monitor Expected result: there should be some CPU usage to refresh the window, but it should not be 100% Actual result: 100% or more. You will also notice other performance issues, such as dragging the ollama window around is slow and chunky. [Sample of Ollama Web Content.txt](https://github.com/user-attachments/files/27055604/Sample.of.Ollama.Web.Content.txt) Regression testing: I feel like this was not as bad as few versions ago, but I'm not sure exactly where this issue came in. ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.21.2
GiteaMirror added the bug label 2026-05-05 03:30:53 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72124