[GH-ISSUE #2831] Windows: connection forcibly closes when adding image to llava prompt - CUDA out of memory #48233

Closed
opened 2026-04-28 07:17:08 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @jakobhoeg on GitHub (Feb 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2831

Originally assigned to: @dhiltgen on GitHub.

I'm trying to use llava to identify a photo and it gives this error:

>>> What is in this image? /users/jakob/desktop/jakob.jpg
Added image '/users/jakob/desktop/jakob.jpg'
Error: Post "http://127.0.0.1:11434/api/chat": read tcp 127.0.0.1:55783->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host.

This is the server.log:

CUDA error: out of memory
clip_model_load: model name:   openai/clip-vit-large-patch14-336
clip_model_load: description:  image encoder for LLaVA
clip_model_load: GGUF version: 3
clip_model_load: alignment:    32
clip_model_load: n_tensors:    377
clip_model_load: n_kv:         19
clip_model_load: ftype:        f16

clip_model_load: loaded meta data with 19 key-value pairs and 377 tensors from C:\Users\jakob\.ollama\models\blobs\sha256-72d6f08a42f656d36b356dbe0920675899a99ce21192fd66266fb7d82ed07539
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv   0:                       general.architecture str              = clip
clip_model_load: - kv   1:                      clip.has_text_encoder bool             = false
clip_model_load: - kv   2:                    clip.has_vision_encoder bool             = true
clip_model_load: - kv   3:                   clip.has_llava_projector bool             = true
clip_model_load: - kv   4:                          general.file_type u32              = 1
clip_model_load: - kv   5:                               general.name str              = openai/clip-vit-large-patch14-336
clip_model_load: - kv   6:                        general.description str              = image encoder for LLaVA
clip_model_load: - kv   7:                        clip.projector_type str              = mlp
clip_model_load: - kv   8:                     clip.vision.image_size u32              = 336
clip_model_load: - kv   9:                     clip.vision.patch_size u32              = 14
clip_model_load: - kv  10:               clip.vision.embedding_length u32              = 1024
clip_model_load: - kv  11:            clip.vision.feed_forward_length u32              = 4096
clip_model_load: - kv  12:                 clip.vision.projection_dim u32              = 768
clip_model_load: - kv  13:           clip.vision.attention.head_count u32              = 16
clip_model_load: - kv  14:   clip.vision.attention.layer_norm_epsilon f32              = 0.000010
clip_model_load: - kv  15:                    clip.vision.block_count u32              = 23
clip_model_load: - kv  16:                     clip.vision.image_mean arr[f32,3]       = [0.481455, 0.457828, 0.408211]
clip_model_load: - kv  17:                      clip.vision.image_std arr[f32,3]       = [0.268630, 0.261303, 0.275777]
clip_model_load: - kv  18:                              clip.use_gelu bool             = false
clip_model_load: - type  f32:  235 tensors
clip_model_load: - type  f16:  142 tensors
clip_model_load: CLIP using CUDA backend
clip_model_load: text_encoder:   0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector:  1
clip_model_load: model size:     595.49 MB
clip_model_load: metadata size:  0.14 MB
clip_model_load: params backend buffer size =  595.49 MB (377 tensors)
clip_model_load: compute allocated memory: 32.89 MB
encode_image_with_clip: image embedding created: 576 tokens

encode_image_with_clip: image encoded in   239.12 ms by CLIP (    0.42 ms per image patch)
  current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:7990
  cuMemSetAccess(g_cuda_pool_addr[device] + g_cuda_pool_size[device], reserve_size, &access, 1)
GGML_ASSERT: C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:243: !"CUDA error"```
Originally created by @jakobhoeg on GitHub (Feb 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2831 Originally assigned to: @dhiltgen on GitHub. I'm trying to use llava to identify a photo and it gives this error: ``` >>> What is in this image? /users/jakob/desktop/jakob.jpg Added image '/users/jakob/desktop/jakob.jpg' Error: Post "http://127.0.0.1:11434/api/chat": read tcp 127.0.0.1:55783->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host. ``` This is the server.log: ```time=2024-02-29T13:19:37.052+01:00 level=INFO source=dyn_ext_server.go:171 msg="loaded 1 images" CUDA error: out of memory clip_model_load: model name: openai/clip-vit-large-patch14-336 clip_model_load: description: image encoder for LLaVA clip_model_load: GGUF version: 3 clip_model_load: alignment: 32 clip_model_load: n_tensors: 377 clip_model_load: n_kv: 19 clip_model_load: ftype: f16 clip_model_load: loaded meta data with 19 key-value pairs and 377 tensors from C:\Users\jakob\.ollama\models\blobs\sha256-72d6f08a42f656d36b356dbe0920675899a99ce21192fd66266fb7d82ed07539 clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output. clip_model_load: - kv 0: general.architecture str = clip clip_model_load: - kv 1: clip.has_text_encoder bool = false clip_model_load: - kv 2: clip.has_vision_encoder bool = true clip_model_load: - kv 3: clip.has_llava_projector bool = true clip_model_load: - kv 4: general.file_type u32 = 1 clip_model_load: - kv 5: general.name str = openai/clip-vit-large-patch14-336 clip_model_load: - kv 6: general.description str = image encoder for LLaVA clip_model_load: - kv 7: clip.projector_type str = mlp clip_model_load: - kv 8: clip.vision.image_size u32 = 336 clip_model_load: - kv 9: clip.vision.patch_size u32 = 14 clip_model_load: - kv 10: clip.vision.embedding_length u32 = 1024 clip_model_load: - kv 11: clip.vision.feed_forward_length u32 = 4096 clip_model_load: - kv 12: clip.vision.projection_dim u32 = 768 clip_model_load: - kv 13: clip.vision.attention.head_count u32 = 16 clip_model_load: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000010 clip_model_load: - kv 15: clip.vision.block_count u32 = 23 clip_model_load: - kv 16: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211] clip_model_load: - kv 17: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777] clip_model_load: - kv 18: clip.use_gelu bool = false clip_model_load: - type f32: 235 tensors clip_model_load: - type f16: 142 tensors clip_model_load: CLIP using CUDA backend clip_model_load: text_encoder: 0 clip_model_load: vision_encoder: 1 clip_model_load: llava_projector: 1 clip_model_load: model size: 595.49 MB clip_model_load: metadata size: 0.14 MB clip_model_load: params backend buffer size = 595.49 MB (377 tensors) clip_model_load: compute allocated memory: 32.89 MB encode_image_with_clip: image embedding created: 576 tokens encode_image_with_clip: image encoded in 239.12 ms by CLIP ( 0.42 ms per image patch) current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:7990 cuMemSetAccess(g_cuda_pool_addr[device] + g_cuda_pool_size[device], reserve_size, &access, 1) GGML_ASSERT: C:\Users\jeff\git\ollama\llm\llama.cpp\ggml-cuda.cu:243: !"CUDA error"```
GiteaMirror added the bugwindows labels 2026-04-28 07:17:08 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jun 1, 2024):

I would suggest giving the latest release a try to see if that improves the situation. That said, these may ultimately be due to https://github.com/ollama/ollama/issues/4599 which I'm still working on.

<!-- gh-comment-id:2143586404 --> @dhiltgen commented on GitHub (Jun 1, 2024): I would suggest giving the latest release a try to see if that improves the situation. That said, these may ultimately be due to https://github.com/ollama/ollama/issues/4599 which I'm still working on.
Author
Owner

@jakobhoeg commented on GitHub (Jun 4, 2024):

I would suggest giving the latest release a try to see if that improves the situation. That said, these may ultimately be due to #4599 which I'm still working on.

That fixed it, thanks!

<!-- gh-comment-id:2147379279 --> @jakobhoeg commented on GitHub (Jun 4, 2024): > I would suggest giving the latest release a try to see if that improves the situation. That said, these may ultimately be due to #4599 which I'm still working on. That fixed it, thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48233