[GH-ISSUE #7061] GPU Tesla at 100% and ollama don't work, is hung #66540

Closed
opened 2026-05-04 07:19:24 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @Domi31tls on GitHub (Oct 1, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7061

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

image

image

image

I have just set up my Debian 12 Linux server. I installed Ollama directly using the following command:

curl -fsSL https://ollama.com/install.sh | sh

I did not make any modifications to the service file. I ran the command:

ollama run phi3.5

I typed "hello". The response was good and correct. However, for my second question, it started writing the beginning of a sentence and then displayed "###" (see screenshot).

I ran the nvidia-smi command (see screenshot), and the processor stays at 100%, and Ollama stops responding.

I should mention that my server does not have a graphical interface, and the Nvidia and CUDA drivers are up to date (update, upgrade). I am providing the top command output for CPU and RAM information.

Thank you for your feedback.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.12

Originally created by @Domi31tls on GitHub (Oct 1, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7061 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? ![image](https://github.com/user-attachments/assets/b187c01f-bb03-4116-b7f6-1d07553af8eb) ![image](https://github.com/user-attachments/assets/55b19214-76e7-4f14-9edd-e2b739ee965f) ![image](https://github.com/user-attachments/assets/02a31135-2884-4060-8e7a-6b271a91ec7e) I have just set up my Debian 12 Linux server. I installed Ollama directly using the following command: curl -fsSL https://ollama.com/install.sh | sh I did not make any modifications to the service file. I ran the command: ollama run phi3.5 I typed "hello". The response was good and correct. However, for my second question, it started writing the beginning of a sentence and then displayed "###" (see screenshot). I ran the nvidia-smi command (see screenshot), and the processor stays at 100%, and Ollama stops responding. I should mention that my server does not have a graphical interface, and the Nvidia and CUDA drivers are up to date (update, upgrade). I am providing the top command output for CPU and RAM information. Thank you for your feedback. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.12
GiteaMirror added the bugnvidianeeds more info labels 2026-05-04 07:19:25 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 1, 2024):

Server logs will aid in debugging.

<!-- gh-comment-id:2386332189 --> @rick-github commented on GitHub (Oct 1, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
Author
Owner

@Domi31tls commented on GitHub (Oct 1, 2024):

Thank for your help
debug.txt

<!-- gh-comment-id:2386674520 --> @Domi31tls commented on GitHub (Oct 1, 2024): Thank for your help [debug.txt](https://github.com/user-attachments/files/17214266/debug.txt)
Author
Owner

@rick-github commented on GitHub (Oct 2, 2024):

Unfortunately there's nothing untoward in the log. The model loads successfully, your bonjour is processed in 181ms and your question apparently in 129s. There are some other commands I take to be ollama ps and ollama list. The only curious entries are some /api/generate commands that are forced to complete when you took the service down. These might have been attempts to unload/reload the model but it's not clear.

You can improve the logging by adding OLLAMA_DEBUG=1 to the server environment. And since the card is a few years old, you could try using an older version of the CUDA library by setting OLLAMA_LLM_LIBRARY=cuda_v11.

<!-- gh-comment-id:2387452139 --> @rick-github commented on GitHub (Oct 2, 2024): Unfortunately there's nothing untoward in the log. The model loads successfully, your `bonjour` is processed in 181ms and your question apparently in 129s. There are some other commands I take to be `ollama ps` and `ollama list`. The only curious entries are some `/api/generate` commands that are forced to complete when you took the service down. These might have been attempts to unload/reload the model but it's not clear. You can improve the logging by adding `OLLAMA_DEBUG=1` to the [server environment](https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux). And since the card is a few years old, you could try using an older version of the CUDA library by setting `OLLAMA_LLM_LIBRARY=cuda_v11`.
Author
Owner

@dhiltgen commented on GitHub (Nov 5, 2024):

@Domi31tls can you try out 0.4.0 and see if that improves the situation? Do you see this failure with other models, or just phi3.5?

<!-- gh-comment-id:2458315501 --> @dhiltgen commented on GitHub (Nov 5, 2024): @Domi31tls can you try out 0.4.0 and see if that improves the situation? Do you see this failure with other models, or just phi3.5?
Author
Owner

@jmorganca commented on GitHub (Jan 4, 2026):

Will close for now since we need some more info, but feel free to let us know if you're still hitting this.

<!-- gh-comment-id:3707543376 --> @jmorganca commented on GitHub (Jan 4, 2026): Will close for now since we need some more info, but feel free to let us know if you're still hitting this.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66540