[GH-ISSUE #3237] Out of memory - GTX 1650 4G #64033

Closed
opened 2026-05-03 15:55:14 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @yxl23 on GitHub (Mar 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3237

Originally assigned to: @mxyng on GitHub.

What is the issue?

CUDA error: out of memory
current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jmorg\git\ollama\llm\llama.cpp\ggml-cuda.cu:8583
cuMemCreate(&handle, reserve_size, &prop, 0)
GGML_ASSERT: C:\Users\jmorg\git\ollama\llm\llama.cpp\ggml-cuda.cu:256: !"CUDA error"

What did you expect to see?

image

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

Windows

Architecture

x86

Platform

No response

Ollama version

0.1.28

GPU

Nvidia

GPU info

image

CPU

Intel

Other software

No response

Originally created by @yxl23 on GitHub (Mar 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3237 Originally assigned to: @mxyng on GitHub. ### What is the issue? CUDA error: out of memory current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jmorg\git\ollama\llm\llama.cpp\ggml-cuda.cu:8583 cuMemCreate(&handle, reserve_size, &prop, 0) GGML_ASSERT: C:\Users\jmorg\git\ollama\llm\llama.cpp\ggml-cuda.cu:256: !"CUDA error" ### What did you expect to see? ![image](https://github.com/ollama/ollama/assets/115678682/7a84531b-3a76-42b6-9ec2-ff95d60038fc) ### Steps to reproduce _No response_ ### Are there any recent changes that introduced the issue? _No response_ ### OS Windows ### Architecture x86 ### Platform _No response_ ### Ollama version 0.1.28 ### GPU Nvidia ### GPU info ![image](https://github.com/ollama/ollama/assets/115678682/993c8af4-05b6-4175-abcc-fc96e86c951c) ### CPU Intel ### Other software _No response_
GiteaMirror added the bugnvidia labels 2026-05-03 15:55:14 -05:00
Author
Owner

@IsaiahParsons commented on GitHub (Mar 21, 2024):

I'm encountering the same issue when loading images into LLaVA : latest. GTX 1650, 4gb.
I'm getting server errors that I suspect are from the CUDA error.

PROMPT & RESPONSE:

What's in this image? \img.png
Added image '\img.png'
Error: Post "http://127.0.0.1:11434/api/chat": read tcp 127.0.0.1:52531->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host.

SERVER LOG:
...
encode_image_with_clip: image encoded in 1229.16 ms by CLIP ( 2.13 ms per image patch)
CUDA error: out of memory
current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:8658
cuMemSetAccess(g_cuda_pool_addr[device] + g_cuda_pool_size[device], reserve_size, &access, 1)
GGML_ASSERT: C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:256: !"CUDA error"

<!-- gh-comment-id:2012609542 --> @IsaiahParsons commented on GitHub (Mar 21, 2024): I'm encountering the same issue when loading images into LLaVA : latest. GTX 1650, 4gb. I'm getting server errors that I suspect are from the CUDA error. PROMPT & RESPONSE: >>> What's in this image? \img.png Added image '\img.png' Error: Post "http://127.0.0.1:11434/api/chat": read tcp 127.0.0.1:52531->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host. SERVER LOG: ... encode_image_with_clip: image encoded in 1229.16 ms by CLIP ( 2.13 ms per image patch) CUDA error: out of memory current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:8658 cuMemSetAccess(g_cuda_pool_addr[device] + g_cuda_pool_size[device], reserve_size, &access, 1) GGML_ASSERT: C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:256: !"CUDA error"
Author
Owner

@samyIO commented on GitHub (Mar 22, 2024):

i have a similiar issue on rtx3060 and win 11:

CUDA error: out of memory
current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:8644
cuMemAddressReserve(&g_cuda_pool_addr[device], CUDA_POOL_VMM_MAX_SIZE, 0, 0, 0)
GGML_ASSERT: C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:256: !"CUDA error"

The Error occures after passing a few inputs to a running chatmodel. I populate embeddings to a vectordb before chatting with the model. Not sure if that is relevant, but just in case.

<!-- gh-comment-id:2014423591 --> @samyIO commented on GitHub (Mar 22, 2024): i have a similiar issue on rtx3060 and win 11: CUDA error: out of memory current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:8644 cuMemAddressReserve(&g_cuda_pool_addr[device], CUDA_POOL_VMM_MAX_SIZE, 0, 0, 0) GGML_ASSERT: C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:256: !"CUDA error" The Error occures after passing a few inputs to a running chatmodel. I populate embeddings to a vectordb before chatting with the model. Not sure if that is relevant, but just in case.
Author
Owner

@jmorganca commented on GitHub (Apr 17, 2024):

This should be improved as of 0.1.32 - please let me know if you're still seeing an error!

<!-- gh-comment-id:2062614682 --> @jmorganca commented on GitHub (Apr 17, 2024): This should be improved as of 0.1.32 - please let me know if you're still seeing an error!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64033