[GH-ISSUE #12782] Add a GPU temperature check during generation or streaming. #8479

Open
opened 2026-04-12 21:10:31 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @SingularityMan on GitHub (Oct 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12782

As better models roll out, agentic use cases are on the rise. This means you need to extend the length of an agent's autonomous tasks for completion, sometimes indefinitely. This could increase temps even for high-end GPUs with a strong cooling system, like my RTX Pro 6000 Blackwell MaxQ with a built-in blower fan and 4 case fans supporting it with vents on top and the sides and lots of airflow.

Sure, you could include something like pynvml in your script to monitor GPU usage, but this doesn't directly have an impact on stopping Ollama's generation unless its done externally. It could be as simple as setting an environment var:

OLLAMA_GPU_TEMPERATURE_THRESHOLD = 87 # <--------- In Celsius
OLLAMA_GPU_TEMPERATURE_COOLDOWN = 30 # <--------- In Seconds

The idea is that once Ollama reaches this temperature limit, it either pauses generation or pre-filling and waits x amount of seconds or stops it altogether, depending on how Ollama handles intermittent generation.

Having run some pretty high-end GPUs for consumers myself, I'd say 87 degrees Celsius is a safe threshold to set because GPUs like mine tend to run at a steady 85 degrees Celsius under normal working loads under LLM generation while 87 indicates elevated temps and 90 is pushing it. 95 is a warning sign to slow down and a typical NVIDIA GPU would shut down around 100C or higher.

I'm a little surprised more local power users haven't pointed this out, but its a real problem for me and others because I'm wary of overheating under pressure as I start using LLMs for more advanced use cases and automation projects. I've already done all I can to mitigate heating problems with my GPU, including proper airflow, cooling and the fact that this GPU only uses up 300W compared to its Axial brother's 600W.

Originally created by @SingularityMan on GitHub (Oct 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12782 As better models roll out, agentic use cases are on the rise. This means you need to extend the length of an agent's autonomous tasks for completion, sometimes indefinitely. This could increase temps even for high-end GPUs with a strong cooling system, like my `RTX Pro 6000 Blackwell MaxQ` with a built-in blower fan and 4 case fans supporting it with vents on top and the sides and lots of airflow. Sure, you could include something like `pynvml` in your script to monitor GPU usage, but this doesn't directly have an impact on stopping Ollama's generation unless its done externally. It could be as simple as setting an environment var: `OLLAMA_GPU_TEMPERATURE_THRESHOLD = 87 # <--------- In Celsius` `OLLAMA_GPU_TEMPERATURE_COOLDOWN = 30 # <--------- In Seconds` The idea is that once Ollama reaches this temperature limit, it either pauses generation or pre-filling and waits x amount of seconds or stops it altogether, depending on how Ollama handles intermittent generation. Having run some pretty high-end GPUs for consumers myself, I'd say 87 degrees Celsius is a safe threshold to set because GPUs like mine tend to run at a steady 85 degrees Celsius under normal working loads under LLM generation while 87 indicates elevated temps and 90 is pushing it. 95 is a warning sign to slow down and a typical NVIDIA GPU would shut down around 100C or higher. I'm a little surprised more local power users haven't pointed this out, but its a real problem for me and others because I'm wary of overheating under pressure as I start using LLMs for more advanced use cases and automation projects. I've already done all I can to mitigate heating problems with my GPU, including proper airflow, cooling and the fact that this GPU only uses up `300W` compared to its Axial brother's `600W`.
GiteaMirror added the feature request label 2026-04-12 21:10:31 -05:00
Author
Owner

@HumbleDeer commented on GitHub (Oct 26, 2025):

I like the idea for its practical use, but I'm fraid this would become a clusterfuck of device specific implementations rather fast. That's because there's no perfect, universal, convenient way to monitor temperatures with current architectures. There's variations in access methods, variations in specifics as to method-orders, and a large variety of different naming schemes for different sensors with no real consensus on what sits in what order or what represents what. Some values are even just, for the lack of better words, made up. Usually that means inference from another measurement though.

If Ollama has/had an external API trigger to pause/slow down/throttle down you could definitely use that with your own script that individually implements whatever your required

<!-- gh-comment-id:3447932203 --> @HumbleDeer commented on GitHub (Oct 26, 2025): I like the idea for its practical use, but I'm fraid this would become a clusterfuck of device specific implementations rather fast. That's because there's no perfect, universal, convenient way to monitor temperatures with current architectures. There's variations in access methods, variations in specifics as to method-orders, and a large variety of different naming schemes for different sensors with no real consensus on what sits in what order or what represents what. Some values are even just, for the lack of better words, made up. Usually that means inference from another measurement though. If Ollama has/had an external API trigger to pause/slow down/throttle down you could definitely use that with your own script that individually implements whatever your required
Author
Owner

@rick-github commented on GitHub (Oct 26, 2025):

Does nvidia-smi -gtt not work?

<!-- gh-comment-id:3447937417 --> @rick-github commented on GitHub (Oct 26, 2025): Does `nvidia-smi -gtt` not work?
Author
Owner

@HumbleDeer commented on GitHub (Oct 26, 2025):

Does nvidia-smi -gtt not work?

nvidia-smi is platform specific (linux) given that it hasn't been practically useful on Windows or Mac since a while and arguably was not even working properly when it was supported on Windows or Mac. Even on Linux, there's no universal guarantee that you'll have consistent readout ability unless you are prepared to handle edge cases. 🙈

<!-- gh-comment-id:3447941252 --> @HumbleDeer commented on GitHub (Oct 26, 2025): > Does `nvidia-smi -gtt` not work? nvidia-smi is platform specific (linux) given that it hasn't been practically useful on Windows or Mac since a while and arguably was not even working properly when it was supported on Windows or Mac. Even on Linux, there's no universal guarantee that you'll have consistent readout ability unless you are prepared to handle edge cases. 🙈
Author
Owner

@rick-github commented on GitHub (Oct 26, 2025):

The point is that Nvidia already have a mechanism for controlling performance based on power and temperature envelopes, whether it's via a command line utility or an app. I agree that there's a lot of device specific information that needs to be handled, and putting it in ollama when the device manufacturer already makes tools for managing it seems like unnecessary work.

<!-- gh-comment-id:3447945361 --> @rick-github commented on GitHub (Oct 26, 2025): The point is that Nvidia already have a mechanism for controlling performance based on power and temperature envelopes, whether it's via a command line utility or an app. I agree that there's a lot of device specific information that needs to be handled, and putting it in ollama when the device manufacturer already makes tools for managing it seems like unnecessary work.
Author
Owner

@HumbleDeer commented on GitHub (Oct 26, 2025):

The point is that Nvidia already have a mechanism for controlling performance based on power and temperature envelopes, whether it's via a command line utility or an app.

Oh right, I see! I had misinterpreted your reply as a reply to me.

And yeah I agree with that. You can throttle your GPU's performance (the speed) rather than throttling the load that's placed on it (Ollama &)

<!-- gh-comment-id:3447948761 --> @HumbleDeer commented on GitHub (Oct 26, 2025): > The point is that Nvidia already have a mechanism for controlling performance based on power and temperature envelopes, whether it's via a command line utility or an app. Oh right, I see! I had misinterpreted your reply as a reply to me. And yeah I agree with that. You can throttle your GPU's performance (the speed) rather than throttling the load that's placed on it (Ollama &)
Author
Owner

@SingularityMan commented on GitHub (Oct 26, 2025):

Does nvidia-smi -gtt not work?

That's only for Linux. That command doesn't have low-level access to Windows like in Linux.

<!-- gh-comment-id:3447951350 --> @SingularityMan commented on GitHub (Oct 26, 2025): > Does `nvidia-smi -gtt` not work? That's only for Linux. That command doesn't have low-level access to Windows like in Linux.
Author
Owner

@HumbleDeer commented on GitHub (Oct 26, 2025):

That command doesn't have low-level access to Windows like in Linux.

Use the manufacturer provided tools then, as later suggested. They nearly all have their own tools that are guaranteed to have basic performance downgrading options.

<!-- gh-comment-id:3447952497 --> @HumbleDeer commented on GitHub (Oct 26, 2025): > That command doesn't have low-level access to Windows like in Linux. Use the manufacturer provided tools then, as later suggested. They nearly all have their own tools that are guaranteed to have basic performance downgrading options.
Author
Owner

@SingularityMan commented on GitHub (Oct 26, 2025):

That command doesn't have low-level access to Windows like in Linux.

Use the manufacturer provided tools then, as later suggested. They nearly all have their own tools that are guaranteed to have basic performance downgrading options.

They do but not in a direct temp control way. Tried nvidia-smi -pl 250 and it did lower the power draw to that, the power demands still exceeded it lmao (but I expect that to happen).

I then tried MSI afterburner, but it doesn't allow me to set a temperature limit for my GPU apparently. That part is grayed out and won't let me toggle it.

I understand there's other third party tools out there but given that my GPU is super new they either haven't caught up or require a combination of strategies that need to be carefully balanced and I'd rather not mess with that.

<!-- gh-comment-id:3448020683 --> @SingularityMan commented on GitHub (Oct 26, 2025): > > That command doesn't have low-level access to Windows like in Linux. > > Use the manufacturer provided tools then, as later suggested. They nearly all have their own tools that are guaranteed to have basic performance downgrading options. They do but not in a direct temp control way. Tried `nvidia-smi -pl 250` and it did lower the power draw to that, the power demands still exceeded it lmao (but I expect that to happen). I then tried MSI afterburner, but it doesn't allow me to set a temperature limit for my GPU apparently. That part is grayed out and won't let me toggle it. I understand there's other third party tools out there but given that my GPU is super new they either haven't caught up or require a combination of strategies that need to be carefully balanced and I'd rather not mess with that.
Author
Owner

@HumbleDeer commented on GitHub (Nov 20, 2025):

They do but not in a direct temp control way.
I then tried MSI afterburner, but it doesn't allow me to set a temperature limit for my GPU apparently.
I understand there's other third party tools out there but they either haven't caught up or require a combination of strategies

All I can conclude from this is that asking "why not add it to Ollama" is a misguided direction to go in if "Why would we add it to Ollama?" is a more representative concern.

That's to say: These features are part of external tools and have historically most-always been for very specific reasons. The added complexity and problems it brings along with it to add a feature that's at best a temporary patch for a specific user or very small subset of users is hard to justify especially given the comparatively enormous (potential) developer time investment.

At its core it sounds like an easy thing to add that can just be ignored by everyone else, but it's easier said than done. There's a good number of possible ways it can balloon in scope creep of fixing bugs with said small added feature where those bug fixes then accidentally break other features but stay unfound for a while or cause unnoticed but unwanted behaviour, and so on and so forth.

I'm sure there's a at the very least a few people that are very eager to share with you their brief input as to what tools to look at so you can write your own external tool that works exactly how you want it to, by having control over the internal working mechanics (e.g. control commands to bus chips, registers it addresses, ...). That's how tools like the MorePowerTool came to exist. :)

All that said and done and spoken, do you somewhat understand what I mean or am trying to frame?
I admit my point of view is specifically biased towards acknowledging the potential pitfalls, and I hope there's someone else's opinion to balance that out if need be.

<!-- gh-comment-id:3559745856 --> @HumbleDeer commented on GitHub (Nov 20, 2025): > They do but not in a direct temp control way. > I then tried MSI afterburner, but it doesn't allow me to set a temperature limit for my GPU apparently. > I understand there's other third party tools out there but they either haven't caught up or require a combination of strategies All I can conclude from this is that asking "why not add it to Ollama" is a misguided direction to go in if "Why *would* we add it to Ollama?" is a more representative concern. That's to say: These features are part of external tools and have historically most-always been for very specific reasons. The added complexity and problems it brings along with it to add a feature that's at best a temporary patch for a specific user or very small subset of users is hard to justify especially given the comparatively enormous (potential) developer time investment. At its core it sounds like an easy thing to add that can just be ignored by everyone else, but it's easier said than done. There's a good number of possible ways it can balloon in scope creep of fixing bugs with said small added feature where those bug fixes then accidentally break other features but stay unfound for a while or cause unnoticed but unwanted behaviour, and so on and so forth. I'm sure there's a at the very least a few people that are very eager to share with you their brief input as to what tools to look at so you can write your own external tool that works exactly how you want it to, by having control over the internal working mechanics (e.g. control commands to bus chips, registers it addresses, ...). That's how tools like the MorePowerTool came to exist. :) All that said and done and spoken, do you somewhat understand what I mean or am trying to frame? I admit my point of view is specifically biased towards acknowledging the potential pitfalls, and I hope there's someone else's opinion to balance that out if need be.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8479