[GH-ISSUE #5589] OOM crash loading codegeex4 on 4G GPU #65528

Closed
opened 2026-05-03 21:35:35 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @chandan0000 on GitHub (Jul 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5589

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

image
other work fine ollama3, gemma2 but i am running codegeex4 model throw error
how i can resolve this issue
my system config
image

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

ollama version is 0.2.1

Originally created by @chandan0000 on GitHub (Jul 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5589 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? ![image](https://github.com/ollama/ollama/assets/60910265/dc193bf7-9418-469c-ab16-d342cf792c67) other work fine ollama3, gemma2 but i am running codegeex4 model throw error how i can resolve this issue my system config ![image](https://github.com/ollama/ollama/assets/60910265/edaf8a76-0274-4d0d-a7dc-7f5e59519ed7) ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version ollama version is 0.2.1
GiteaMirror added the memorybug labels 2026-05-03 21:35:36 -05:00
Author
Owner

@darwinvelez58 commented on GitHub (Jul 10, 2024):

I have this issue Issue!

imagen

running with x3 7900 XTX, doesn't have any sense!

imagen

<!-- gh-comment-id:2221595402 --> @darwinvelez58 commented on GitHub (Jul 10, 2024): I have this issue Issue! ![imagen](https://github.com/ollama/ollama/assets/118543481/81932c5d-2316-4943-a755-d26a77e8341d) running with x3 7900 XTX, doesn't have any sense! ![imagen](https://github.com/ollama/ollama/assets/118543481/7d83d54c-cc4a-4cb2-95b9-52beae56309e)
Author
Owner

@darwinvelez58 commented on GitHub (Jul 10, 2024):

This is happening after 1.44rocm release!

<!-- gh-comment-id:2221596008 --> @darwinvelez58 commented on GitHub (Jul 10, 2024): This is happening after 1.44rocm release!
Author
Owner

@dhiltgen commented on GitHub (Jul 23, 2024):

@chandan0000 I believe your GPU is a 4G card. I have been able to reproduce the crash on another 4G card, and it looks like we're over-allocating 1 too many layers. In my setup, we tried to load 28/41 layers and that crashed. If I specify 27 layers it successfully loads the model.

% curl http://localhost:11434/api/generate -d '{
  "model": "codegeex4",
  "prompt": "hello?",
  "stream": false, "options": {"num_gpu": 27 }
}'
{"model":"codegeex4","created_at":"2024-07-23T23:33:52.341542262Z","response":"Hello! How can I assist you today?","done":true,"done_reason":"stop","context":[151331,151333,151336,198,14978,30,151337,198,9703,0,2585,646,358,7789,498,3351,30],"total_duration":13563765176,"load_duration":7871846153,"prompt_eval_count":8,"prompt_eval_duration":2509817000,"eval_count":10,"eval_duration":3180908000}%
% nvidia-smi
Tue Jul 23 23:36:20 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GT 1030         Off |   00000000:01:00.0 Off |                  N/A |
| 35%   34C    P8             N/A /   19W |    3948MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    202781      C   ...unners/cuda_v11/ollama_llama_server       3944MiB |
+-----------------------------------------------------------------------------------------+
<!-- gh-comment-id:2246516080 --> @dhiltgen commented on GitHub (Jul 23, 2024): @chandan0000 I believe your GPU is a 4G card. I have been able to reproduce the crash on another 4G card, and it looks like we're over-allocating 1 too many layers. In my setup, we tried to load 28/41 layers and that crashed. If I specify 27 layers it successfully loads the model. ``` % curl http://localhost:11434/api/generate -d '{ "model": "codegeex4", "prompt": "hello?", "stream": false, "options": {"num_gpu": 27 } }' {"model":"codegeex4","created_at":"2024-07-23T23:33:52.341542262Z","response":"Hello! How can I assist you today?","done":true,"done_reason":"stop","context":[151331,151333,151336,198,14978,30,151337,198,9703,0,2585,646,358,7789,498,3351,30],"total_duration":13563765176,"load_duration":7871846153,"prompt_eval_count":8,"prompt_eval_duration":2509817000,"eval_count":10,"eval_duration":3180908000}% ``` ``` % nvidia-smi Tue Jul 23 23:36:20 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GT 1030 Off | 00000000:01:00.0 Off | N/A | | 35% 34C P8 N/A / 19W | 3948MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 202781 C ...unners/cuda_v11/ollama_llama_server 3944MiB | +-----------------------------------------------------------------------------------------+ ```
Author
Owner

@dhiltgen commented on GitHub (Jul 23, 2024):

@darwinvelez58 your issue is unrelated to the OOM crash of codegeex4 - we'll track the 3x ROCm bug via #5629

<!-- gh-comment-id:2246520768 --> @dhiltgen commented on GitHub (Jul 23, 2024): @darwinvelez58 your issue is unrelated to the OOM crash of codegeex4 - we'll track the 3x ROCm bug via #5629
Author
Owner

@dhiltgen commented on GitHub (Jul 23, 2024):

This was fixed in v0.2.2. Confirmed on 0.2.8 it loads 23 layers and doesn't crash on the 4G GPU.

<!-- gh-comment-id:2246545860 --> @dhiltgen commented on GitHub (Jul 23, 2024): This was fixed in v0.2.2. Confirmed on 0.2.8 it loads 23 layers and doesn't crash on the 4G GPU.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65528