Problem witch loading models in vram #8567

@rick-github commented on GitHub (Nov 2, 2025):

GTX 750 is supported so should work in either OS. Server log may help with debugging.

@rick-github commented on GitHub (Nov 2, 2025): GTX 750 is supported so should work in either OS. [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) may help with debugging.

GiteaMirror commented

@Savin1play140 commented on GitHub (Nov 2, 2025):

GTX 750 is supported so should work in either OS. Server log may help with debugging.

i know about support a gtx 750, how i can enable debug? preferably without rebuilding the Ollama

@Savin1play140 commented on GitHub (Nov 2, 2025): > GTX 750 is supported so should work in either OS. [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) may help with debugging. i know about support a gtx 750, how i can enable debug? preferably without rebuilding the Ollama

GiteaMirror commented

@rick-github commented on GitHub (Nov 2, 2025):

Set OLLAMA_DEBUG=2 in the server environment.

@rick-github commented on GitHub (Nov 2, 2025): Set `OLLAMA_DEBUG=2` in the [server environment](https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-do-i-configure-ollama-server).

GiteaMirror commented

@Savin1play140 commented on GitHub (Nov 2, 2025):

GTX 750 is supported so should work in either OS. Server log may help with debugging.

And more, if i install cuda 12.9 in Linux (bypassing the default installation of CUDA 13), will this work?

@Savin1play140 commented on GitHub (Nov 2, 2025): > GTX 750 is supported so should work in either OS. [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) may help with debugging. And more, if i install cuda 12.9 in Linux (bypassing the default installation of CUDA 13), will this work?

GiteaMirror commented

@rick-github commented on GitHub (Nov 2, 2025):

Ollama should work with CUDA 13. Support was only added recently though, so if there are issues, the server log would be helpful.

@rick-github commented on GitHub (Nov 2, 2025): Ollama should work with CUDA 13. Support was only added recently though, so if there are issues, the server log would be helpful.

GiteaMirror commented

2025-11-12 14:45:59 -06:00

@Savin1play140 commented on GitHub (Nov 2, 2025):

Ollama should work with CUDA 13. Support was only added recently though, so if there are issues, the server log would be helpful.

That's not the problem, I think the CUDA 13 won't work with my gtx 750.

@Savin1play140 commented on GitHub (Nov 2, 2025): > Ollama should work with CUDA 13. Support was only added recently though, so if there are issues, the server log would be helpful. That's not the problem, I think the CUDA 13 won't work with my gtx 750.

GiteaMirror commented

@rick-github commented on GitHub (Nov 2, 2025):

You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12.

@rick-github commented on GitHub (Nov 2, 2025): You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12.

GiteaMirror commented

2025-11-12 14:45:59 -06:00

@Savin1play140 commented on GitHub (Nov 2, 2025):

You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12.

Will ollama work correctly if I replace CUDA 13, which installs Linux by default, with CUDA 12.9?

@Savin1play140 commented on GitHub (Nov 2, 2025): > You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12. Will ollama work correctly if I replace CUDA 13, which installs Linux by default, with CUDA 12.9?

GiteaMirror commented

2025-11-12 14:45:59 -06:00

@Savin1play140 commented on GitHub (Nov 2, 2025):

You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12.
Will ollama work correctly if I replace CUDA 13, which installs Linux by default, with CUDA 12.9?

@Savin1play140 commented on GitHub (Nov 2, 2025): > You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12. Will ollama work correctly if I replace CUDA 13, which installs Linux by default, with CUDA 12.9?

GiteaMirror commented

@rick-github commented on GitHub (Nov 2, 2025):

Yes.

@rick-github commented on GitHub (Nov 2, 2025): Yes.

GiteaMirror commented

@Savin1play140 commented on GitHub (Nov 2, 2025):

Yes.

Okay, thanks you.

@Savin1play140 commented on GitHub (Nov 2, 2025): > Yes. Okay, thanks you.

GiteaMirror commented

@pdevine commented on GitHub (Nov 4, 2025):

I think this is answered? I'm going to go ahead and close the issue, but can reopen if it's still a problem.

@pdevine commented on GitHub (Nov 4, 2025): I think this is answered? I'm going to go ahead and close the issue, but can reopen if it's still a problem.

GiteaMirror commented

2025-11-12 14:46:01 -06:00

@Savin1play140 commented on GitHub (Nov 4, 2025):

server.log
In addition, ollama starts loading the processor by ~20-30% after a request to load a model

@Savin1play140 commented on GitHub (Nov 4, 2025): [server.log](https://github.com/user-attachments/files/23333737/server.log) In addition, ollama starts loading the processor by ~20-30% after a request to load a model

GiteaMirror commented

@rick-github commented on GitHub (Nov 4, 2025):

The log doesn't show a model load. ollama does see the GPU:

time=2025-11-04T12:14:55.245+03:00 level=INFO source=types.go:42 msg="inference compute"
 id=GPU-45737e26-eef3-de24-eb43-691dc0d0f26b filtered_id="" library=CUDA compute=5.0 name=CUDA0
 description="NVIDIA GeForce GTX 750" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:01:00.0 type=discrete
 total="2.0 GiB" available="1.8 GiB"

@rick-github commented on GitHub (Nov 4, 2025): The log doesn't show a model load. ollama does see the GPU: ``` time=2025-11-04T12:14:55.245+03:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-45737e26-eef3-de24-eb43-691dc0d0f26b filtered_id="" library=CUDA compute=5.0 name=CUDA0 description="NVIDIA GeForce GTX 750" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:01:00.0 type=discrete total="2.0 GiB" available="1.8 GiB" ```

GiteaMirror commented

2025-11-12 14:46:01 -06:00

@Savin1play140 commented on GitHub (Nov 5, 2025):

The log doesn't show a model load. ollama does see the GPU:

time=2025-11-04T12:14:55.245+03:00 level=INFO source=types.go:42 msg="inference compute"
 id=GPU-45737e26-eef3-de24-eb43-691dc0d0f26b filtered_id="" library=CUDA compute=5.0 name=CUDA0
 description="NVIDIA GeForce GTX 750" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:01:00.0 type=discrete
 total="2.0 GiB" available="1.8 GiB"

yes, that's right, but the problem still exists

@Savin1play140 commented on GitHub (Nov 5, 2025): > The log doesn't show a model load. ollama does see the GPU: > ``` > time=2025-11-04T12:14:55.245+03:00 level=INFO source=types.go:42 msg="inference compute" > id=GPU-45737e26-eef3-de24-eb43-691dc0d0f26b filtered_id="" library=CUDA compute=5.0 name=CUDA0 > description="NVIDIA GeForce GTX 750" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:01:00.0 type=discrete > total="2.0 GiB" available="1.8 GiB" > ``` yes, that's right, but the problem still exists

GiteaMirror commented

2025-11-12 14:46:01 -06:00

@rick-github commented on GitHub (Nov 5, 2025):

And if you add a log that shows a model load, then the problem could be diagnosed.

@rick-github commented on GitHub (Nov 5, 2025): And if you add a log that shows a model load, then the problem could be diagnosed.

GiteaMirror commented

@Savin1play140 commented on GitHub (Nov 5, 2025):

And if you add a log that shows a model load, then the problem could be diagnosed.

how i can do it?

@Savin1play140 commented on GitHub (Nov 5, 2025): > And if you add a log that shows a model load, then the problem could be diagnosed. how i can do it?

GiteaMirror commented

@rick-github commented on GitHub (Nov 5, 2025):

Load a model: ollama run qwen3:0.6b
Add the log to this issue.

@rick-github commented on GitHub (Nov 5, 2025): 1. Load a model: `ollama run qwen3:0.6b` 2. Add the log to this issue.

GiteaMirror commented

@Savin1play140 commented on GitHub (Nov 5, 2025):

Load a model: ollama run qwen3:0.6b

Add the log to this issue.

I already did this, judging by the sides, the model doesn't even try to load

@Savin1play140 commented on GitHub (Nov 5, 2025): > 1. Load a model: `ollama run qwen3:0.6b` > 2. Add the log to this issue. I already did this, judging by the sides, the model doesn't even try to load

GiteaMirror commented