[GH-ISSUE #3460] dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral #48646

Closed
opened 2026-04-28 08:58:50 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @sebastianlau on GitHub (Apr 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3460

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Ollama crashes out entirely with error (throws error, then terminates process)

[CUDA error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:445 cudaMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"](error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:445 cudaMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error")

What did you expect to see?

Output (any)

Steps to reproduce

  1. Start Ollama / Navigate to Open WebUI
  2. Enter any text

Notes:

  • used dolphin-mixtral as the model
  • CUDA_VISIBLE_DEVICES used to set GPU order (16GB, 8GB)

Are there any recent changes that introduced the issue?

Update from 0.1.29 to 0.1.30 (reverting back to 0.1.29 fixed)

OS

Windows

Architecture

amd64

Platform

No response

Ollama version

0.1.30

GPU

Nvidia

GPU info

GPU 0: NVIDIA GeForce GTX 1080 (8GB)
GPU 1: Tesla P100-PCIE-16GB

CPU

AMD

Other software

Windows Server 2022 Standard x64 21H2

Originally created by @sebastianlau on GitHub (Apr 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3460 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Ollama crashes out entirely with error (throws error, then terminates process) [CUDA error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:445 cudaMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"](error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:445 cudaMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error") ### What did you expect to see? Output (any) ### Steps to reproduce 1. Start Ollama / Navigate to Open WebUI 2. Enter any text Notes: - used dolphin-mixtral as the model - CUDA_VISIBLE_DEVICES used to set GPU order (16GB, 8GB) ### Are there any recent changes that introduced the issue? Update from 0.1.29 to 0.1.30 (reverting back to 0.1.29 fixed) ### OS Windows ### Architecture amd64 ### Platform _No response_ ### Ollama version 0.1.30 ### GPU Nvidia ### GPU info GPU 0: NVIDIA GeForce GTX 1080 (8GB) GPU 1: Tesla P100-PCIE-16GB ### CPU AMD ### Other software Windows Server 2022 Standard x64 21H2
GiteaMirror added the gpubugnvidiawindows labels 2026-04-28 08:58:52 -05:00
Author
Owner

@Zig1375 commented on GitHub (Apr 3, 2024):

I encounter the same issue from time to time when num_ctx is set to 2048.
If num_ctx is set to 4096 or higher, the error occurs consistently (using a Nvidia 4070 with 12GB of memory (RAM 64GB)).

<!-- gh-comment-id:2034068792 --> @Zig1375 commented on GitHub (Apr 3, 2024): I encounter the same issue from time to time when **num_ctx** is set to **2048**. If **num_ctx** is set to **4096** or higher, the error occurs consistently (using a Nvidia 4070 with 12GB of memory (RAM 64GB)).
Author
Owner

@FonzieBonzo commented on GitHub (Apr 4, 2024):

Same as this, someone is working on it.....

<!-- gh-comment-id:2036716970 --> @FonzieBonzo commented on GitHub (Apr 4, 2024): Same as [this](https://github.com/ollama/ollama/issues/3431#issue-2217360558), someone is working on it.....
Author
Owner

@ghost commented on GitHub (Apr 10, 2024):

Previously it was running well but after some time, started to show same error that :

requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

then i checked "C:\Users<username>\AppData\Local\Ollama\server.log" file and found following error at end of file:

CUDA error: out of memory
  current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:532
  cuMemSetAccess(pool_addr + pool_size, reserve_size, &access, 1)
GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"

then i tried with following solution :
modifying values of num_ctx & num_gpu it resolved

image

but after this it is consuming too much RAM about 90% of my RAM!, but yah its running 👍

image

<!-- gh-comment-id:2046300002 --> @ghost commented on GitHub (Apr 10, 2024): Previously it was running well but after some time, started to show same error that : `requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))` then i checked **"C:\Users\<username>\AppData\Local\Ollama\server.log"** file and found following error at end of file: ``` CUDA error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:532 cuMemSetAccess(pool_addr + pool_size, reserve_size, &access, 1) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error" ``` then i tried with following solution : **modifying values of num_ctx & num_gpu it resolved** ![image](https://github.com/ollama/ollama/assets/166391298/f4ac9247-4f5c-4b1a-a1e7-d2f46cdd0610) but after this it is consuming too much RAM about 90% of my RAM!, but yah its running 👍 ![image](https://github.com/ollama/ollama/assets/166391298/92b24df2-dfee-4ae4-82ce-73aa700b07d3)
Author
Owner

@dhiltgen commented on GitHub (Jun 1, 2024):

I don't have a test environment to verify this asymmetry, but PR #4517 may fix this.

<!-- gh-comment-id:2143616697 --> @dhiltgen commented on GitHub (Jun 1, 2024): I don't have a test environment to verify this asymmetry, but PR #4517 may fix this.
Author
Owner

@Zig1375 commented on GitHub (Jun 2, 2024):

On my side it works fine now, I haven't being seeing this error how about a few weeks or maybe a month.

<!-- gh-comment-id:2143813096 --> @Zig1375 commented on GitHub (Jun 2, 2024): On my side it works fine now, I haven't being seeing this error how about a few weeks or maybe a month.
Author
Owner

@sebastianlau commented on GitHub (Jun 3, 2024):

I just checked and it "seems" to work with WebUI 0.2.2 and ollama 0.1.41
I say seems because a) it was incredibly slow (at least 2 times slower than when I used 0.1.29) and b) the UI had issues (not sure if this is due to the UI or API though) -- seen as the title not updating and the response only being visible by navigating away then back (or refreshing)

<!-- gh-comment-id:2145216952 --> @sebastianlau commented on GitHub (Jun 3, 2024): I just checked and it "seems" to work with WebUI 0.2.2 and ollama 0.1.41 I say seems because a) it was incredibly slow (at least 2 times slower than when I used 0.1.29) and b) the UI had issues (not sure if this is due to the UI or API though) -- seen as the title not updating and the response only being visible by navigating away then back (or refreshing)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48646