[GH-ISSUE #3969] cuda subprocess exits immediately with host cuda library in path #64492

Closed
opened 2026-05-03 17:51:33 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @makeryangcom on GitHub (Apr 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3969

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Currently, I start the service via the ollama server and then use the ollamajs API to call related models, such as llama3:8b. In the actual conversation process, I find that the GPU resources don’t seem to be utilized, with the GPU usage rate almost at 1%. This is very confusing to me. What preparations should I make to ensure the GPU is utilized?

Even when I switch to llama3:70b, there is no change in GPU usage, it just causes my memory to be filled up.

const chat_stream = await data.value.ollama.chat({
    model: "llama3:8b",
    messages: message,
    stream: true,
});
for await (const part of chat_stream) {
...
}

CPU:Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz 2.80 GHz
GPU:NVIDIA GeForce GTX 1060 5G
Memory:32GB

222

OS

Windows

GPU

Nvidia

CPU

Intel, AMD

Ollama version

0.1.32

Originally created by @makeryangcom on GitHub (Apr 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3969 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Currently, I start the service via the ollama server and then use the ollamajs API to call related models, such as llama3:8b. In the actual conversation process, I find that the GPU resources don’t seem to be utilized, with the GPU usage rate almost at 1%. This is very confusing to me. What preparations should I make to ensure the GPU is utilized? Even when I switch to llama3:70b, there is no change in GPU usage, it just causes my memory to be filled up. ``` const chat_stream = await data.value.ollama.chat({ model: "llama3:8b", messages: message, stream: true, }); for await (const part of chat_stream) { ... } ``` CPU:Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz 2.80 GHz GPU:NVIDIA GeForce GTX 1060 5G Memory:32GB ![222](https://github.com/ollama/ollama/assets/156150246/4a4d63a9-c883-4aae-af5f-f250797727b5) ### OS Windows ### GPU Nvidia ### CPU Intel, AMD ### Ollama version 0.1.32
GiteaMirror added the bugnvidiawindows labels 2026-05-03 17:51:39 -05:00
Author
Owner

@makeryangcom commented on GitHub (Apr 27, 2024):

I observed a runtime log message.

{"function":"server_params_parse","level":"WARN","line":2378,"msg":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1,"tid":"23132","timestamp":1714181660}
<!-- gh-comment-id:2080307021 --> @makeryangcom commented on GitHub (Apr 27, 2024): I observed a runtime log message. ``` {"function":"server_params_parse","level":"WARN","line":2378,"msg":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1,"tid":"23132","timestamp":1714181660} ```
Author
Owner

@makeryangcom commented on GitHub (Apr 29, 2024):

Continue waiting...

<!-- gh-comment-id:2082800816 --> @makeryangcom commented on GitHub (Apr 29, 2024): Continue waiting...
Author
Owner

@dhiltgen commented on GitHub (May 1, 2024):

Can you share your server log?

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

<!-- gh-comment-id:2089134468 --> @dhiltgen commented on GitHub (May 1, 2024): Can you share your server log? https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
Author
Owner

@makeryangcom commented on GitHub (May 4, 2024):

Can you share your server log?

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

The translated sentence is: "I can't find any log files according to the installation document's guide path.

$env:OLLAMA_DEBUG="1"
explorer %LOCALAPPDATA%\Ollama to view logs
explorer %LOCALAPPDATA%\Programs\Ollama to browse the binaries (The installer adds this to your user PATH)
explorer %HOMEPATH%\.ollama to browse where models and configuration is stored
explorer %TEMP% where temporary executable files are stored in one or more ollama* directories

My GPU information is as follows.

20240504102312

The GPU information after starting the Ollama service is as follows.

20240504102752

I am using a binary file to run ollama.exe serve through nodejs.

<!-- gh-comment-id:2093965579 --> @makeryangcom commented on GitHub (May 4, 2024): > Can you share your server log? > > https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md The translated sentence is: "I can't find any log files according to the installation document's guide path. ``` $env:OLLAMA_DEBUG="1" ``` ``` explorer %LOCALAPPDATA%\Ollama to view logs explorer %LOCALAPPDATA%\Programs\Ollama to browse the binaries (The installer adds this to your user PATH) explorer %HOMEPATH%\.ollama to browse where models and configuration is stored explorer %TEMP% where temporary executable files are stored in one or more ollama* directories ``` My GPU information is as follows. ![20240504102312](https://github.com/ollama/ollama/assets/156150246/9dcf0f91-f269-4f4b-a7a3-f99530647752) The GPU information after starting the Ollama service is as follows. ![20240504102752](https://github.com/ollama/ollama/assets/156150246/08314605-467d-4433-acdf-4c34f3134930) I am using a binary file to run ollama.exe serve through nodejs.
Author
Owner

@makeryangcom commented on GitHub (May 4, 2024):

Can you share your server log?

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

The translated sentence is: "The following log is the output information in the terminal after running ollama.exe serve with nodejs, not sure if it helps.

20240504103105

<!-- gh-comment-id:2093967118 --> @makeryangcom commented on GitHub (May 4, 2024): > Can you share your server log? > > https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md The translated sentence is: "The following log is the output information in the terminal after running ollama.exe serve with nodejs, not sure if it helps. ![20240504103105](https://github.com/ollama/ollama/assets/156150246/570b3a73-6a7e-4bee-9120-4f40b552f172)
Author
Owner

@dhiltgen commented on GitHub (May 4, 2024):

Thanks for the server log. Based on that, you've hit the PhysX bug, making this a dup of #4008. The workaround is to remove the PhysX directory from your PATH, or add another CUDA library location earlier in the PATH then it will correctly discover the GPU.

<!-- gh-comment-id:2094372183 --> @dhiltgen commented on GitHub (May 4, 2024): Thanks for the server log. Based on that, you've hit the PhysX bug, making this a dup of #4008. The workaround is to remove the PhysX directory from your PATH, or add another CUDA library location earlier in the PATH then it will correctly discover the GPU.
Author
Owner

@makeryangcom commented on GitHub (May 5, 2024):

Thanks for the server log. Based on that, you've hit the PhysX bug, making this a dup of #4008. The workaround is to remove the PhysX directory from your PATH, or add another CUDA library location earlier in the PATH then it will correctly discover the GPU.

I removed c:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\ from PATH, but it still didn't solve my problem.

However, I did make some good progress; it seems that the GPU has been recognized. Here is the complete run log:

1

2

3

4

5

I need to emphasize that I did not use OllamaSetup.exe to install the relevant programs for running; I only ran ollama-windows-amd64.exe.

<!-- gh-comment-id:2094568302 --> @makeryangcom commented on GitHub (May 5, 2024): > Thanks for the server log. Based on that, you've hit the PhysX bug, making this a dup of #4008. The workaround is to remove the PhysX directory from your PATH, or add another CUDA library location earlier in the PATH then it will correctly discover the GPU. I removed c:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\ from PATH, but it still didn't solve my problem. However, I did make some good progress; it seems that the GPU has been recognized. Here is the complete run log: ![1](https://github.com/ollama/ollama/assets/156150246/084938ae-6983-47e4-8a77-ee2793a2cecc) ![2](https://github.com/ollama/ollama/assets/156150246/c1244b9f-853a-4d38-a5b7-3c87bbfd80c3) ![3](https://github.com/ollama/ollama/assets/156150246/d807247e-d400-436f-ad55-b18437b53379) ![4](https://github.com/ollama/ollama/assets/156150246/dbfd8f82-9342-4ef8-9531-a5b34b5dbac3) ![5](https://github.com/ollama/ollama/assets/156150246/f79cc142-0343-4b27-9593-fbfa61345307) **I need to emphasize that I did not use `OllamaSetup.exe` to install the relevant programs for running; I only ran `ollama-windows-amd64.exe`.**
Author
Owner

@dhiltgen commented on GitHub (May 6, 2024):

@makeryangcom it looks like we got past the PhysX problem and correctly identified the GPU, however when we tried to start the subprocess to run the model it immediately exited. I'm not sure, but I think this may be an incompatibility between the compiled binary and the cuda library we found in the path. I can think of two things to try. Add the directory where you extracted the ollama-windows-amd64.zip into the PATH first, or remove all the CUDA directories from the path. Lets see if that combination yields a running GPU runner.

<!-- gh-comment-id:2095017163 --> @dhiltgen commented on GitHub (May 6, 2024): @makeryangcom it looks like we got past the PhysX problem and correctly identified the GPU, however when we tried to start the subprocess to run the model it immediately exited. I'm not sure, but I think this may be an incompatibility between the compiled binary and the cuda library we found in the path. I can think of two things to try. Add the directory where you extracted the ollama-windows-amd64.zip into the PATH first, or remove all the CUDA directories from the path. Lets see if that combination yields a running GPU runner.
Author
Owner

@makeryangcom commented on GitHub (May 6, 2024):

@makeryangcom it looks like we got past the PhysX problem and correctly identified the GPU, however when we tried to start the subprocess to run the model it immediately exited. I'm not sure, but I think this may be an incompatibility between the compiled binary and the cuda library we found in the path. I can think of two things to try. Add the directory where you extracted the ollama-windows-amd64.zip into the PATH first, or remove all the CUDA directories from the path. Lets see if that combination yields a running GPU runner.

When I use ollama app.exe from ollama-windows-amd64.zip or OllamaSetup.exe, the PATH is not modified and the GPU resources can be used normally. My current need is to only integrate a single ollama.exe into my project to implement the launch of the ollama service. If I follow your method of using ollama-windows-amd64.zip, I must start ollama app.exe, which will result in a new tray icon appearing in the computer's taskbar.

<!-- gh-comment-id:2095025017 --> @makeryangcom commented on GitHub (May 6, 2024): > @makeryangcom it looks like we got past the PhysX problem and correctly identified the GPU, however when we tried to start the subprocess to run the model it immediately exited. I'm not sure, but I think this may be an incompatibility between the compiled binary and the cuda library we found in the path. I can think of two things to try. Add the directory where you extracted the ollama-windows-amd64.zip into the PATH first, or remove all the CUDA directories from the path. Lets see if that combination yields a running GPU runner. When I use ollama app.exe from ollama-windows-amd64.zip or OllamaSetup.exe, the PATH is not modified and the GPU resources can be used normally. My current need is to only integrate a single ollama.exe into my project to implement the launch of the ollama service. If I follow your method of using ollama-windows-amd64.zip, I must start ollama app.exe, which will result in a new tray icon appearing in the computer's taskbar.
Author
Owner

@makeryangcom commented on GitHub (May 6, 2024):

@makeryangcom it looks like we got past the PhysX problem and correctly identified the GPU, however when we tried to start the subprocess to run the model it immediately exited. I'm not sure, but I think this may be an incompatibility between the compiled binary and the cuda library we found in the path. I can think of two things to try. Add the directory where you extracted the ollama-windows-amd64.zip into the PATH first, or remove all the CUDA directories from the path. Lets see if that combination yields a running GPU runner.

My issue has been resolved; I had mixed up the installation packages. Thank you for your help.

<!-- gh-comment-id:2095041343 --> @makeryangcom commented on GitHub (May 6, 2024): > @makeryangcom it looks like we got past the PhysX problem and correctly identified the GPU, however when we tried to start the subprocess to run the model it immediately exited. I'm not sure, but I think this may be an incompatibility between the compiled binary and the cuda library we found in the path. I can think of two things to try. Add the directory where you extracted the ollama-windows-amd64.zip into the PATH first, or remove all the CUDA directories from the path. Lets see if that combination yields a running GPU runner. My issue has been resolved; I had mixed up the installation packages. Thank you for your help.
Author
Owner

@makeryangcom commented on GitHub (May 6, 2024):

@makeryangcom it looks like we got past the PhysX problem and correctly identified the GPU, however when we tried to start the subprocess to run the model it immediately exited. I'm not sure, but I think this may be an incompatibility between the compiled binary and the cuda library we found in the path. I can think of two things to try. Add the directory where you extracted the ollama-windows-amd64.zip into the PATH first, or remove all the CUDA directories from the path. Lets see if that combination yields a running GPU runner.

Do I need to add the entire path of the extracted Ollama-darwin.zip to the PATH on MacOS? Then run the file Ollama.app/Contents/Resources/ollama?

<!-- gh-comment-id:2095044787 --> @makeryangcom commented on GitHub (May 6, 2024): > @makeryangcom it looks like we got past the PhysX problem and correctly identified the GPU, however when we tried to start the subprocess to run the model it immediately exited. I'm not sure, but I think this may be an incompatibility between the compiled binary and the cuda library we found in the path. I can think of two things to try. Add the directory where you extracted the ollama-windows-amd64.zip into the PATH first, or remove all the CUDA directories from the path. Lets see if that combination yields a running GPU runner. Do I need to add the entire path of the extracted Ollama-darwin.zip to the PATH on MacOS? Then run the file Ollama.app/Contents/Resources/ollama?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64492