[GH-ISSUE #3982] CUDA error while trying to run llama3-8B: out of memory #48976

Closed
opened 2026-04-28 10:23:40 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @piotrfila on GitHub (Apr 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3982

What is the issue?

Hello,

I am trying to run llama3-8B:instruct on 2 * GTX 970 (4GB, CUDA 5.2), no SLI. #1288 led me to believe it should be possible in terms of VRAM requirements (8GB total) and I also have enough RAM (16GB). However, each time I try to run the model the ollama service crashes due to out of memory error and no response is returned (tried with oterm, open-webui and ollama run).

This seems similar to #3765. I tried editing the modelfile as mentioned there without success.

The model runs successfully without GPU accleration. Other applications can use cuda just fine (checked with these examples).

I installed ollama through the NixOS-unstable option.

Modelfile I tried:

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
FROM llama3:instruct

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
PARAMETER num_ctx 4196
PARAMETER num_gpu 42
PARAMETER num_keep 24
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

Tail of journalctl log: https://pastebin.com/yVH5Sgai

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.31

Originally created by @piotrfila on GitHub (Apr 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3982 ### What is the issue? Hello, I am trying to run llama3-8B:instruct on 2 * GTX 970 (4GB, CUDA 5.2), no SLI. #1288 led me to believe it should be possible in terms of VRAM requirements (8GB total) and I also have enough RAM (16GB). However, each time I try to run the model the ollama service crashes due to out of memory error and no response is returned (tried with oterm, open-webui and `ollama run`). This seems similar to #3765. I tried editing the modelfile as mentioned there without success. The model runs successfully without GPU accleration. Other applications can use cuda just fine (checked with [these examples](https://github.com/grahamc/nixos-cuda-example)). I installed ollama through the NixOS-unstable option. Modelfile I tried: ``` # Modelfile generated by "ollama show" # To build a new Modelfile based on this one, replace the FROM line with: FROM llama3:instruct TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|>""" PARAMETER num_ctx 4196 PARAMETER num_gpu 42 PARAMETER num_keep 24 PARAMETER stop "<|start_header_id|>" PARAMETER stop "<|end_header_id|>" PARAMETER stop "<|eot_id|>" ``` Tail of journalctl log: https://pastebin.com/yVH5Sgai ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.31
GiteaMirror added the bug label 2026-04-28 10:23:40 -05:00
Author
Owner

@helium729 commented on GitHub (Apr 28, 2024):

Usually a 8B q4_0 model requires at least 5.4 GB of VRAM each, guess yours is not sufficient, you can try some models <= 4b

<!-- gh-comment-id:2081507820 --> @helium729 commented on GitHub (Apr 28, 2024): Usually a 8B q4_0 model requires at least 5.4 GB of VRAM each, guess yours is not sufficient, you can try some models <= 4b
Author
Owner

@piotrfila commented on GitHub (Apr 28, 2024):

Yeah, phi3 works fine. I didn't know more total vram was needed when using multiple GPUs but it makes sense now that I think about it.

<!-- gh-comment-id:2081593948 --> @piotrfila commented on GitHub (Apr 28, 2024): Yeah, phi3 works fine. I didn't know more total vram was needed when using multiple GPUs but it makes sense now that I think about it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48976