[GH-ISSUE #4223] qwen:72b-chat-q4_K_S does not load #49144

Closed
opened 2026-04-28 10:49:36 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @saddy001 on GitHub (May 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4223

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Where the model "qwen:72b" loads successfully, the model "qwen:72b-chat-q4_K_S" does not load. The loading spinner just doesn't stop even after waiting a long time. Since the models occupy the same amount of memory (41 GB) I assume the RAM usage is roughly the same. Can somebody reproduce this?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.33

Originally created by @saddy001 on GitHub (May 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4223 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Where the model "qwen:72b" loads successfully, the model "qwen:72b-chat-q4_K_S" does not load. The loading spinner just doesn't stop even after waiting a long time. Since the models occupy the same amount of memory (41 GB) I assume the RAM usage is roughly the same. Can somebody reproduce this? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.33
GiteaMirror added the memorybug labels 2026-04-28 10:49:36 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jul 25, 2024):

It looks like our memory prediction is incorrect on this model. On a 48G GPU, we think we can load all 81 layers but it crashes with OOM. In reality, only ~60 layers will fit. Until we can get this fixed, you can specify an explicit num_gpu setting.

<!-- gh-comment-id:2251158439 --> @dhiltgen commented on GitHub (Jul 25, 2024): It looks like our memory prediction is incorrect on this model. On a 48G GPU, we think we can load all 81 layers but it crashes with OOM. In reality, only ~60 layers will fit. Until we can get this fixed, you can specify an explicit `num_gpu` setting.
Author
Owner

@jessegross commented on GitHub (Sep 24, 2025):

I'm going to go ahead and close this now that the new memory management logic is on by default. If you continue to see problems, please file a new issue.

<!-- gh-comment-id:3330124977 --> @jessegross commented on GitHub (Sep 24, 2025): I'm going to go ahead and close this now that the new memory management logic is on by default. If you continue to see problems, please file a new issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49144