[GH-ISSUE #7879] goroutine 7 [running] #51550

Closed
opened 2026-04-28 20:34:04 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @donghyun-mf on GitHub (Nov 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7879

What is the issue?

Hello.

I am testing with olama. Using olama, the embedded model and LLM model are on gpu. As soon as you enter the question, the embedded and LLM models are sequentially performed.
And since we built a web server using streamlit, we have to handle a lot of question inputs. We ran the olama server by adding an option to quickly process a large number of requests using olama. The commands used to run the server are as follows. CUDA_VISIBLE_DEVICE=1 OLAMA_GPU_OVERHEAD=500000000 OLAMA_NUM_PARALLEL=11 OLAMA_KEEP_ALIVE=-1 olama service

Run the streamlit app and access the web server to input and answer questions well. However, at an unspecified time, olama outputs the following message.
The first picture is ollama 0.4.2, and the second picture is a message picture that occurred when using ollama 0.4.6.
In 0.4.2, the answer is not answered only when the error message has occurred, and the next question is generated again. However, after the error message in 0.4.6, the olama lm model is deallocated from gpu memory and the olama server is also stopped. For reference, the same phenomenon as 0.4.6 occurs in all 0.4.3 to 0.4.5.

image

image

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.4.6, 0.4.2

Originally created by @donghyun-mf on GitHub (Nov 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7879 ### What is the issue? Hello. I am testing with olama. Using olama, the embedded model and LLM model are on gpu. As soon as you enter the question, the embedded and LLM models are sequentially performed. And since we built a web server using streamlit, we have to handle a lot of question inputs. We ran the olama server by adding an option to quickly process a large number of requests using olama. The commands used to run the server are as follows. CUDA_VISIBLE_DEVICE=1 OLAMA_GPU_OVERHEAD=500000000 OLAMA_NUM_PARALLEL=11 OLAMA_KEEP_ALIVE=-1 olama service Run the streamlit app and access the web server to input and answer questions well. However, at an unspecified time, olama outputs the following message. The first picture is ollama 0.4.2, and the second picture is a message picture that occurred when using ollama 0.4.6. In 0.4.2, the answer is not answered only when the error message has occurred, and the next question is generated again. However, after the error message in 0.4.6, the olama lm model is deallocated from gpu memory and the olama server is also stopped. For reference, the same phenomenon as 0.4.6 occurs in all 0.4.3 to 0.4.5. ![image](https://github.com/user-attachments/assets/9a9c44bb-49fc-4273-9ea2-79ac30e84cff) ![image](https://github.com/user-attachments/assets/62c1c181-87c0-4302-ad3b-e3b0e79dd74b) ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.4.6, 0.4.2
GiteaMirror added the needs more infobug labels 2026-04-28 20:34:04 -05:00
Author
Owner

@jessegross commented on GitHub (Dec 3, 2024):

Can you please post the full server log from when you run into the issue on 0.4.6? There's not enough context in the screenshot.

How consistently can you reproduce it?

<!-- gh-comment-id:2513335881 --> @jessegross commented on GitHub (Dec 3, 2024): Can you please post the full [server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) from when you run into the issue on 0.4.6? There's not enough context in the screenshot. How consistently can you reproduce it?
Author
Owner

@pdevine commented on GitHub (Mar 21, 2025):

Going to go ahead and close this since there hasn't been an update in a bit.

<!-- gh-comment-id:2744185376 --> @pdevine commented on GitHub (Mar 21, 2025): Going to go ahead and close this since there hasn't been an update in a bit.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51550