[GH-ISSUE #8929] Python Script Hangs on Sequential Image Processing with llama3.2-vision with Ollama API – No Error, No Finish #52304

Closed
opened 2026-04-28 22:57:24 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @Bardo-Konrad on GitHub (Feb 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8929

What is the issue?

I am using

print("Start")
# ollama.chat API-Aufruf
response = ollama.chat(
	model="llama3.2-vision",
	messages=[{
		'role': 'user',
		'content': 'Was siehst Du auf dem Bild?', # = "What do you see on the image?"
		'images': ['temp_resized_image.jpg']
	}]
)
print("Finish")

for a bunch of images in linear fashion, so one after the other.

And after a while(or because of some specific image), Finish is never printed, also no error message.

Relevant log output

[GIN] 2025/02/07 - 14:52:39 | 500 |     11h20m15s |       127.0.0.1 | POST     "/api/chat"

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.5.7

Originally created by @Bardo-Konrad on GitHub (Feb 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8929 ### What is the issue? I am using ``` print("Start") # ollama.chat API-Aufruf response = ollama.chat( model="llama3.2-vision", messages=[{ 'role': 'user', 'content': 'Was siehst Du auf dem Bild?', # = "What do you see on the image?" 'images': ['temp_resized_image.jpg'] }] ) print("Finish") ``` for a bunch of images in linear fashion, so one after the other. And after a while(or because of some specific image), Finish is never printed, also no error message. ### Relevant log output ```shell [GIN] 2025/02/07 - 14:52:39 | 500 | 11h20m15s | 127.0.0.1 | POST "/api/chat" ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-04-28 22:57:24 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 7, 2025):

Server logs with OLLAMA_DEBUG=1 may aid in debugging.

<!-- gh-comment-id:2643322664 --> @rick-github commented on GitHub (Feb 7, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) with `OLLAMA_DEBUG=1` may aid in debugging.
Author
Owner

@Bardo-Konrad commented on GitHub (Feb 7, 2025):

I have already set OLLAMA_DEBUG=1 in windows environment, then restarted ollama. If server.log is the only log, then that's all I have.

<!-- gh-comment-id:2643402881 --> @Bardo-Konrad commented on GitHub (Feb 7, 2025): I have already set OLLAMA_DEBUG=1 in windows environment, then restarted ollama. If server.log is the only log, then that's all I have.
Author
Owner

@rick-github commented on GitHub (Feb 7, 2025):

You posted one line of GIN output. That's not enough to debug your problem.

<!-- gh-comment-id:2643407194 --> @rick-github commented on GitHub (Feb 7, 2025): You posted one line of GIN output. That's not enough to debug your problem.
Author
Owner

@Bardo-Konrad commented on GitHub (Feb 7, 2025):

Seems that setting the environment variable is different than

$env:OLLAMA_DEBUG="1"
& "ollama app.exe"

So I used that and I got debug messages now and I am waiting for another timeout.

<!-- gh-comment-id:2643445498 --> @Bardo-Konrad commented on GitHub (Feb 7, 2025): Seems that setting the environment variable is different than ``` $env:OLLAMA_DEBUG="1" & "ollama app.exe" ``` So I used that and I got debug messages now and I am waiting for another timeout.
Author
Owner

@Bardo-Konrad commented on GitHub (Feb 7, 2025):

So this is the issue:

time=2025-02-07T17:58:01.016+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021
time=2025-02-07T17:59:14.628+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021
time=2025-02-07T18:00:27.771+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021
time=2025-02-07T18:01:39.762+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021
time=2025-02-07T18:02:51.065+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021
time=2025-02-07T18:04:02.909+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021
time=2025-02-07T18:05:14.910+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021
time=2025-02-07T18:06:27.130+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021

and on it goes.

<!-- gh-comment-id:2643506820 --> @Bardo-Konrad commented on GitHub (Feb 7, 2025): So this is the issue: time=2025-02-07T17:58:01.016+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021 time=2025-02-07T17:59:14.628+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021 time=2025-02-07T18:00:27.771+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021 time=2025-02-07T18:01:39.762+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021 time=2025-02-07T18:02:51.065+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021 time=2025-02-07T18:04:02.909+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021 time=2025-02-07T18:05:14.910+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021 time=2025-02-07T18:06:27.130+01:00 level=DEBUG source=cache.go:231 msg="context limit hit - shifting" id=0 limit=2048 input=2048 keep=5 discard=1021 and on it goes.
Author
Owner

@rick-github commented on GitHub (Feb 7, 2025):

The model has exceeded the context buffer and lost coherence, and is just rambling. You can either increase the context buffer (but the model may just increase output and hit the new limit) or set num_predict in the chat call to limit the number of tokens generated.

<!-- gh-comment-id:2643573392 --> @rick-github commented on GitHub (Feb 7, 2025): The model has exceeded the context buffer and lost coherence, and is just rambling. You can either increase the context buffer (but the model may just increase output and hit the new limit) or set `num_predict` in the `chat` call to limit the number of tokens generated.
Author
Owner

@Bardo-Konrad commented on GitHub (Feb 7, 2025):

Interesting. Using 250 for num_predict, does that mean up to 1000 characters output?

<!-- gh-comment-id:2643593507 --> @Bardo-Konrad commented on GitHub (Feb 7, 2025): Interesting. Using 250 for `num_predict`, does that mean up to 1000 characters output?
Author
Owner

@rick-github commented on GitHub (Feb 7, 2025):

Approximately, it varies depending on the embed encoding.

<!-- gh-comment-id:2643597493 --> @rick-github commented on GitHub (Feb 7, 2025): Approximately, it varies depending on the embed encoding.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52304