[GH-ISSUE #10795] gpu没有跑满,是否可以将剩余的gpu全部用来提升模型的推理速度 #53600

Closed
opened 2026-04-29 04:07:56 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @valueLzy on GitHub (May 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10795

Image Image
Originally created by @valueLzy on GitHub (May 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10795 <img width="665" alt="Image" src="https://github.com/user-attachments/assets/94d88f00-48b2-4a01-8c37-e18e64c13fed" /> <img width="456" alt="Image" src="https://github.com/user-attachments/assets/d2d95834-1030-4427-bf7b-a26beddbfc3f" />
GiteaMirror added the question label 2026-04-29 04:07:56 -05:00
Author
Owner

@rick-github commented on GitHub (May 21, 2025):

You can increase the memory usage by increasing the size of the context buffer, but it won't speed up inference.

<!-- gh-comment-id:2897214780 --> @rick-github commented on GitHub (May 21, 2025): You can increase the memory usage by increasing the size of the [context buffer](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size), but it won't speed up inference.
Author
Owner

@valueLzy commented on GitHub (May 21, 2025):

您可以通过增加上下文缓冲区的大小来增加内存使用量,但这不会加快推理速度。

谢谢指导,后续会有加快推理速度的功能计划吗

<!-- gh-comment-id:2897232084 --> @valueLzy commented on GitHub (May 21, 2025): > [您可以通过增加上下文缓冲区](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size)的大小来增加内存使用量,但这不会加快推理速度。 谢谢指导,后续会有加快推理速度的功能计划吗
Author
Owner

@rick-github commented on GitHub (May 21, 2025):

There are always plans to improve performance, but your model is already 100% loaded in GPU, so there's not much room for improvement.

<!-- gh-comment-id:2897438575 --> @rick-github commented on GitHub (May 21, 2025): There are always plans to improve performance, but your model is already 100% loaded in GPU, so there's not much room for improvement.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53600