[GH-ISSUE #7557] 我的文件明明只有12GB,为什么加载的时候,却显示需要22GB显存? #4811

Closed
opened 2026-04-12 15:47:20 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @fg2501 on GitHub (Nov 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7557

What is the issue?

3f5ba904-d558-4770-b3e3-ee7783694b2f
1111
我用的qwen2.5:14b模型,为什么我加载的时候,显示却是需要22GB显存呢?

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.3.13

Originally created by @fg2501 on GitHub (Nov 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7557 ### What is the issue? ![3f5ba904-d558-4770-b3e3-ee7783694b2f](https://github.com/user-attachments/assets/1a2ef30a-766c-4f00-873c-0b3468238754) ![1111](https://github.com/user-attachments/assets/03dba386-b3cb-4267-9d02-07b048c3a503) 我用的qwen2.5:14b模型,为什么我加载的时候,显示却是需要22GB显存呢? ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.3.13
GiteaMirror added the question label 2026-04-12 15:47:20 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 7, 2024):

Context window.

<!-- gh-comment-id:2462258823 --> @rick-github commented on GitHub (Nov 7, 2024): [Context window](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size).
Author
Owner

@dhiltgen commented on GitHub (Nov 7, 2024):

ollama list shows the size of the model on disk. Once loaded into the GPU, more VRAM is required, which varies from model to model based on their architecture.

<!-- gh-comment-id:2463297227 --> @dhiltgen commented on GitHub (Nov 7, 2024): `ollama list` shows the size of the model on disk. Once loaded into the GPU, more VRAM is required, which varies from model to model based on their architecture.
Author
Owner

@fg2501 commented on GitHub (Nov 8, 2024):

ollama list 显示模型在磁盘上的大小。一旦加载到 GPU 中,就需要更多的 VRAM,这取决于它们的架构,各不相同。

好的非常感谢!

<!-- gh-comment-id:2463565450 --> @fg2501 commented on GitHub (Nov 8, 2024): > `ollama list` 显示模型在磁盘上的大小。一旦加载到 GPU 中,就需要更多的 VRAM,这取决于它们的架构,各不相同。 好的非常感谢!
Author
Owner

@fg2501 commented on GitHub (Nov 8, 2024):

问题已经解决,并不单单是上下文长度的问题,而是环境变量的问题,OLLAMA_NUM_PARALLEL=16,改成1以后,就好了,同时处理的信息太大所导致的显存量增加。

<!-- gh-comment-id:2463680035 --> @fg2501 commented on GitHub (Nov 8, 2024): 问题已经解决,并不单单是上下文长度的问题,而是环境变量的问题,OLLAMA_NUM_PARALLEL=16,改成1以后,就好了,同时处理的信息太大所导致的显存量增加。
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4811