[GH-ISSUE #5648] image description model is too slow #3522

Closed
opened 2026-04-12 14:13:31 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @codeMonkey-shin on GitHub (Jul 12, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5648

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

After updating to the latest version, I am using llava:13b on Ubuntu, and the API call speed takes about 1 minute.

It was originally about 10 seconds, but it became too slow.

The graphics card is A30.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

i just use curl -fsSL https://ollama.com/install.sh | sh <--this

Originally created by @codeMonkey-shin on GitHub (Jul 12, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5648 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? After updating to the latest version, I am using llava:13b on Ubuntu, and the API call speed takes about 1 minute. It was originally about 10 seconds, but it became too slow. The graphics card is A30. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version i just use curl -fsSL https://ollama.com/install.sh | sh <--this
GiteaMirror added the nvidiabug labels 2026-04-12 14:13:32 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 12, 2024):

If you add logs it will be easier to diagnose the issue.

<!-- gh-comment-id:2225122545 --> @rick-github commented on GitHub (Jul 12, 2024): If you add logs it will be easier to diagnose the issue.
Author
Owner

@dhiltgen commented on GitHub (Jul 23, 2024):

I believe that's a 24G card so it should have plenty of VRAM for the llava:13b model. I don't have the exact same card, but on another CC 8.x cuda card, the model loads, and runs at ~37 tokens per second.

Please make sure you're running the latest version, and if you're still having slow performance, please share your server log, and ollama ps output and I'll reopen the issue.

% ollama run llava:13b --verbose Describe this image /Users/daniel/Downloads/image.png
Added image '/Users/daniel/Downloads/image.png'
 The image features a graphic design on a heather gray background. At the top, there is text that reads "The
Ollamas," presented in a bold and slightly italicized font with a slight slant to the right. Below this text,
there are five cartoon cats arranged in a single file on what appears to be a zebra crossing. Each cat has a
distinct facial expression; they range from looking excited or happy to appearing curious or contemplative. The
style of the image is playful and whimsical, reminiscent of comic strips or cartoons commonly found in digital
media.

total duration:       4.628870347s
load duration:        50.471069ms
prompt eval count:    1 token(s)
prompt eval duration: 879.256ms
prompt eval rate:     1.14 tokens/s
eval count:           133 token(s)
eval duration:        3.586105s
eval rate:            37.09 tokens/s
% ollama ps
NAME     	ID          	SIZE 	PROCESSOR	UNTIL
llava:13b	0d0eb4d7f485	10 GB	100% GPU 	3 minutes from now
<!-- gh-comment-id:2246380969 --> @dhiltgen commented on GitHub (Jul 23, 2024): I believe that's a 24G card so it should have plenty of VRAM for the llava:13b model. I don't have the exact same card, but on another CC 8.x cuda card, the model loads, and runs at ~37 tokens per second. Please make sure you're running the latest version, and if you're still having slow performance, please share your server log, and `ollama ps` output and I'll reopen the issue. ``` % ollama run llava:13b --verbose Describe this image /Users/daniel/Downloads/image.png Added image '/Users/daniel/Downloads/image.png' The image features a graphic design on a heather gray background. At the top, there is text that reads "The Ollamas," presented in a bold and slightly italicized font with a slight slant to the right. Below this text, there are five cartoon cats arranged in a single file on what appears to be a zebra crossing. Each cat has a distinct facial expression; they range from looking excited or happy to appearing curious or contemplative. The style of the image is playful and whimsical, reminiscent of comic strips or cartoons commonly found in digital media. total duration: 4.628870347s load duration: 50.471069ms prompt eval count: 1 token(s) prompt eval duration: 879.256ms prompt eval rate: 1.14 tokens/s eval count: 133 token(s) eval duration: 3.586105s eval rate: 37.09 tokens/s % ollama ps NAME ID SIZE PROCESSOR UNTIL llava:13b 0d0eb4d7f485 10 GB 100% GPU 3 minutes from now ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3522