[GH-ISSUE #9496] CPU only via CUDA_VISIBLE_DEVICES -1 and mistral:latest produces Orphaned processes #68241

Open
opened 2026-05-04 12:58:35 -05:00 by GiteaMirror · 15 comments
Owner

Originally created by @YonTracks on GitHub (Mar 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9496

What is the issue?

If the system environment for CUDA_VISIBLE_DEVICES is set to -1
with 0.5.13 official OllamaSetup.exe

windows 11 32gb ram | rtx3060 12gb
v12.8 cuda

This setup will crash and restart ollama and the log producing orphaned processes, all while the cli behavior remains fine until, the memory starts to fill because of the orphans, then the cli starts to hang etc.
only a reboot of the pc, or end the processes manually, will clear it, restarting ollama doesn't fix it.

Originally created by @YonTracks on GitHub (Mar 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9496 ### What is the issue? If the system environment for `CUDA_VISIBLE_DEVICES` is set to `-1` with 0.5.13 official OllamaSetup.exe windows 11 32gb ram | rtx3060 12gb v12.8 cuda This setup will crash and restart ollama and the log producing orphaned processes, all while the `cli` behavior remains fine until, the memory starts to fill because of the orphans, then the `cli` starts to hang etc. only a reboot of the pc, or end the processes manually, will clear it, restarting ollama doesn't fix it.
GiteaMirror added the bug label 2026-05-04 12:58:35 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

This setup will crash and restart ollama and the log producing orphaned processes

There are no crashes, server or runner, in this log.

seems ollama is disabling the gpu as expected, but then still looking for gpu compatibility, and erroring

This is because you are setting CUDA_VISIBLE_DEVICES to an invalid value.

If the system environment for CUDA_VISIBLE_DEVICES is set to -1 or "" and the following:

This is causing the cuInit err: 100 errors, which can be ignored since you are disabling the GPU.

<!-- gh-comment-id:2697777582 --> @rick-github commented on GitHub (Mar 4, 2025): > This setup will crash and restart ollama and the log producing orphaned processes There are no crashes, server or runner, in this log. > seems ollama is disabling the gpu as expected, but then still looking for gpu compatibility, and erroring This is because you are setting `CUDA_VISIBLE_DEVICES` to an invalid value. > If the system environment for CUDA_VISIBLE_DEVICES is set to -1 or "" and the following: This is causing the `cuInit err: 100` errors, which can be ignored since you are disabling the GPU.
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

again. the logs reset to the above log. I will get more tomorrow no doubt. cheers.

The logs don't reset, they rotate when the ollama server is started. The previous log will be in server-1.log. If the server is crashing, the contents of app.log (or app-1.log) may also be helpful.

<!-- gh-comment-id:2697875832 --> @rick-github commented on GitHub (Mar 4, 2025): > again. the logs reset to the above log. I will get more tomorrow no doubt. cheers. The logs don't reset, they rotate when the ollama server is started. The previous log will be in server-1.log. If the server is crashing, the contents of app.log (or app-1.log) may also be helpful.
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

Exit code 3221225477 is 0xC0000005, Access Violation, like Linux SEGV:

An Access Violation is a type of Exception caused when an application Reads, Writes or Executes an invalid Memory Address.

The memory address may be invalid because of one of these common scenarios:

NULL Pointer - addresses between 0x0 and 0x10000 (64K) - e.g. a function that usually returns a pointer returned NULL (0x0), and the pointer was accessed without verification
Memory Corruption - the address was mistakenly or maliciously overwritten - commonly via a buffer overrun (or underrun)
Use-After-Free - the address was valid, but is now being accessed after it is freed (data) or unloaded (code)
Bit-Flip - RAM (hardware) issue where one or more bits have flipped (rare)

Hopefully the server log will provide some more context.

<!-- gh-comment-id:2698997933 --> @rick-github commented on GitHub (Mar 4, 2025): Exit code 3221225477 is 0xC0000005, Access Violation, like Linux SEGV: ``` An Access Violation is a type of Exception caused when an application Reads, Writes or Executes an invalid Memory Address. The memory address may be invalid because of one of these common scenarios: NULL Pointer - addresses between 0x0 and 0x10000 (64K) - e.g. a function that usually returns a pointer returned NULL (0x0), and the pointer was accessed without verification Memory Corruption - the address was mistakenly or maliciously overwritten - commonly via a buffer overrun (or underrun) Use-After-Free - the address was valid, but is now being accessed after it is freed (data) or unloaded (code) Bit-Flip - RAM (hardware) issue where one or more bits have flipped (rare) ``` Hopefully the server log will provide some more context.
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

Error: listen tcp 127.0.0.1:11434: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted.

Two (or more) servers are trying to run.

<!-- gh-comment-id:2699023702 --> @rick-github commented on GitHub (Mar 4, 2025): ``` Error: listen tcp 127.0.0.1:11434: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted. ``` Two (or more) servers are trying to run.
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

Merging the logs:

time=2025-03-05T08:00:52.159+10:00 level=WARN source=server.go:163 msg="server crash 3 - exit code 3221225477 - respawning"
time=2025-03-05T08:00:53.659+10:00 level=INFO source=server.go:141 msg="starting server..."
time=2025-03-05T08:00:53.665+10:00 level=INFO source=server.go:127 msg="started ollama server with pid 20176"
time=2025-03-05T08:00:53.665+10:00 level=INFO source=server.go:129 msg="ollama server logs C:\\Users\\clint\\AppData\\Local\\Ollama\\server.log"
time=2025-03-05T08:00:53.741+10:00 level=INFO source=routes.go:1277 msg="Listening on 127.0.0.1:11434 (version 0.5.13)"
...
time=2025-03-05T08:01:37.154+10:00 level=DEBUG source=routes.go:1501 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\nAlways include the language and file name in the info string when you write code blocks, for example '```python file.py'.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI want to make a 12v generator<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
time=2025-03-05T08:01:37.156+10:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=49 used=0 remaining=49
time=2025-03-05T08:02:22.239+10:00 level=WARN source=server.go:163 msg="server crash 4 - exit code 3221225477 - respawning"

Previous server crashed at 08:00:52.159. New server running at 08:00:53.741. Last message from server at 08:01:37.156, but the server monitor doesn't notice until 08:02:22.239.

Nothing in the logs to indicate a problem.

Try this: stop the ollama server (right click ollama icon), open a terminal and manually start a server: ollama serve. Then open another terminal and run ollama run llama3.1 I want to make a 12v generator. If the ollama server crashes in the first terminal, paste the output here.

<!-- gh-comment-id:2699200683 --> @rick-github commented on GitHub (Mar 4, 2025): Merging the logs: ``` time=2025-03-05T08:00:52.159+10:00 level=WARN source=server.go:163 msg="server crash 3 - exit code 3221225477 - respawning" time=2025-03-05T08:00:53.659+10:00 level=INFO source=server.go:141 msg="starting server..." time=2025-03-05T08:00:53.665+10:00 level=INFO source=server.go:127 msg="started ollama server with pid 20176" time=2025-03-05T08:00:53.665+10:00 level=INFO source=server.go:129 msg="ollama server logs C:\\Users\\clint\\AppData\\Local\\Ollama\\server.log" time=2025-03-05T08:00:53.741+10:00 level=INFO source=routes.go:1277 msg="Listening on 127.0.0.1:11434 (version 0.5.13)" ... time=2025-03-05T08:01:37.154+10:00 level=DEBUG source=routes.go:1501 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\nAlways include the language and file name in the info string when you write code blocks, for example '```python file.py'.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI want to make a 12v generator<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" time=2025-03-05T08:01:37.156+10:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=49 used=0 remaining=49 time=2025-03-05T08:02:22.239+10:00 level=WARN source=server.go:163 msg="server crash 4 - exit code 3221225477 - respawning" ``` Previous server crashed at 08:00:52.159. New server running at 08:00:53.741. Last message from server at 08:01:37.156, but the server monitor doesn't notice until 08:02:22.239. Nothing in the logs to indicate a problem. Try this: stop the ollama server (right click ollama icon), open a terminal and manually start a server: `ollama serve`. Then open another terminal and run `ollama run llama3.1 I want to make a 12v generator`. If the ollama server crashes in the first terminal, paste the output here.
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

It looks like no model was loaded - did it crash without you running ollama run llama3.1?

<!-- gh-comment-id:2699219919 --> @rick-github commented on GitHub (Mar 4, 2025): It looks like no model was loaded - did it crash without you running `ollama run llama3.1`?
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

starts the server no matter what happens.

Yes, ollama in windows will start a server if it's not running.

So the server got a segmentation violation during startup. That's unusual. What hardware/software are you running?

<!-- gh-comment-id:2699223785 --> @rick-github commented on GitHub (Mar 4, 2025): > starts the server no matter what happens. Yes, ollama in windows will start a server if it's not running. So the server got a segmentation violation during startup. That's unusual. What hardware/software are you running?
Author
Owner

@rick-github commented on GitHub (Mar 4, 2025):

So far you have logs for six CPU-only runs.

Two failed just after startup.
One started a runner, loaded a model and sent a prompt, but started to compete with another server for :11434.
One started a runner but then crashed after sending the prompt.
One started a runner, sent it a prompt, and successfully returned the response to the client. Maybe crashed.
The last started a runner, sent it a prompt, returned the response to the client, and then crashed.

There doesn't seem to be a pattern here.

<!-- gh-comment-id:2699295103 --> @rick-github commented on GitHub (Mar 4, 2025): So far you have logs for six CPU-only runs. Two failed just after startup. One started a runner, loaded a model and sent a prompt, but started to compete with another server for :11434. One started a runner but then crashed after sending the prompt. One started a runner, sent it a prompt, and successfully returned the response to the client. Maybe crashed. The last started a runner, sent it a prompt, returned the response to the client, and then crashed. There doesn't seem to be a pattern here.
Author
Owner

@rick-github commented on GitHub (Mar 5, 2025):

According to the logs, the crashes occur at different points. Just after startup, after sending a prompt, and after sending a prompt and getting a response. Have you considered running a memory tester?

<!-- gh-comment-id:2699354444 --> @rick-github commented on GitHub (Mar 5, 2025): According to the logs, the crashes occur at different points. Just after startup, after sending a prompt, and after sending a prompt and getting a response. Have you considered running a memory tester?
Author
Owner

@rick-github commented on GitHub (Mar 5, 2025):

PS C:\Windows\System32> ollama run llama3.2
pulling manifest
pulling dde5aa3fc5ff...  28% ▕███████████████                                         ▏ 559 MB/2.0 GB  6.7 MB/s   3m36s
Error: Post "http://127.0.0.1:11434/api/show": dial tcp 127.0.0.1:11434: connectex: No connection could be made because the target machine actively refused it.

And another different crash. There's no consistent pattern to the crashes.

<!-- gh-comment-id:2699357391 --> @rick-github commented on GitHub (Mar 5, 2025): ``` PS C:\Windows\System32> ollama run llama3.2 pulling manifest pulling dde5aa3fc5ff... 28% ▕███████████████ ▏ 559 MB/2.0 GB 6.7 MB/s 3m36s Error: Post "http://127.0.0.1:11434/api/show": dial tcp 127.0.0.1:11434: connectex: No connection could be made because the target machine actively refused it. ``` And another different crash. There's no consistent pattern to the crashes.
Author
Owner

@rick-github commented on GitHub (Mar 5, 2025):

Let's try a different way of forcing CPU-only mode. Instead of setting CUDA_VISIBLE_DEVICES="", try setting num_gpu=0.

$ ollama run llama3.1
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> I want to make a 12v generator
...
>>> /bye

If there are no crashes here, it indicates that it's not a CPU problem. Rather, it's setting CUDA_VISIBLE_DEVICES to an invalid value that's causing the issue. Perhaps some interaction of the Nvidia driver with its environment when it's not properly initialized.

<!-- gh-comment-id:2699470433 --> @rick-github commented on GitHub (Mar 5, 2025): Let's try a different way of forcing CPU-only mode. Instead of setting `CUDA_VISIBLE_DEVICES=""`, try setting `num_gpu=0`. ```sh $ ollama run llama3.1 >>> /set parameter num_gpu 0 Set parameter 'num_gpu' to '0' >>> I want to make a 12v generator ... >>> /bye ``` If there are no crashes here, it indicates that it's not a CPU problem. Rather, it's setting `CUDA_VISIBLE_DEVICES` to an invalid value that's causing the issue. Perhaps some interaction of the Nvidia driver with its environment when it's not properly initialized.
Author
Owner

@rick-github commented on GitHub (Mar 5, 2025):

So it works fine, meaning that the problem is triggered by an invalid CUDA_VISIBLE_DEVICES. That's all Nvidia code, not a lot ollama can do about it.

<!-- gh-comment-id:2699484838 --> @rick-github commented on GitHub (Mar 5, 2025): So it works fine, meaning that the problem is triggered by an invalid `CUDA_VISIBLE_DEVICES`. That's all Nvidia code, not a lot ollama can do about it.
Author
Owner

@rick-github commented on GitHub (Mar 5, 2025):

Worth a try.

<!-- gh-comment-id:2699505813 --> @rick-github commented on GitHub (Mar 5, 2025): Worth a try.
Author
Owner

@rick-github commented on GitHub (Mar 5, 2025):

If you set CUDA_VISIBLE_DEVICES=10, does the problem occur? Or is just for -1 or ""?

<!-- gh-comment-id:2699534796 --> @rick-github commented on GitHub (Mar 5, 2025): If you set `CUDA_VISIBLE_DEVICES=10`, does the problem occur? Or is just for -1 or ""?
Author
Owner

@rick-github commented on GitHub (Mar 5, 2025):

Did it crash for 10?

<!-- gh-comment-id:2699541565 --> @rick-github commented on GitHub (Mar 5, 2025): Did it crash for 10?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68241