Problem witch loading models in vram #8567

Closed
opened 2025-11-12 14:45:57 -06:00 by GiteaMirror · 23 comments
Owner

Originally created by @Savin1play140 on GitHub (Nov 2, 2025).

I have gtx 750 with 2GB of vram, but LLM model can't load, i try use qwen3 with 0.6b, on Windows gpu do discovered but model not loading, on Linux gpu don't discovered because i can't install CUDA 12.9 in system. what should I do?

OS

Windows and Linux

GPU

msi gtx 750 2GB

CPU

intel core i3 3210

Ollama version

v0.12.9

Originally created by @Savin1play140 on GitHub (Nov 2, 2025). I have gtx 750 with 2GB of vram, but LLM model can't load, i try use qwen3 with 0.6b, on Windows gpu do discovered but model not loading, on Linux gpu don't discovered because i can't install CUDA 12.9 in system. what should I do? ## OS Windows and Linux ## GPU msi gtx 750 2GB ## CPU intel core i3 3210 ## Ollama version v0.12.9
Author
Owner

@rick-github commented on GitHub (Nov 2, 2025):

GTX 750 is supported so should work in either OS. Server log may help with debugging.

@rick-github commented on GitHub (Nov 2, 2025): GTX 750 is supported so should work in either OS. [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) may help with debugging.
Author
Owner

@Savin1play140 commented on GitHub (Nov 2, 2025):

GTX 750 is supported so should work in either OS. Server log may help with debugging.

i know about support a gtx 750, how i can enable debug? preferably without rebuilding the Ollama

@Savin1play140 commented on GitHub (Nov 2, 2025): > GTX 750 is supported so should work in either OS. [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) may help with debugging. i know about support a gtx 750, how i can enable debug? preferably without rebuilding the Ollama
Author
Owner

@rick-github commented on GitHub (Nov 2, 2025):

Set OLLAMA_DEBUG=2 in the server environment.

@rick-github commented on GitHub (Nov 2, 2025): Set `OLLAMA_DEBUG=2` in the [server environment](https://github.com/ollama/ollama/blob/main/docs/faq.mdx#how-do-i-configure-ollama-server).
Author
Owner

@Savin1play140 commented on GitHub (Nov 2, 2025):

GTX 750 is supported so should work in either OS. Server log may help with debugging.

And more, if i install cuda 12.9 in Linux (bypassing the default installation of CUDA 13), will this work?

@Savin1play140 commented on GitHub (Nov 2, 2025): > GTX 750 is supported so should work in either OS. [Server log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx) may help with debugging. And more, if i install cuda 12.9 in Linux (bypassing the default installation of CUDA 13), will this work?
Author
Owner

@rick-github commented on GitHub (Nov 2, 2025):

Ollama should work with CUDA 13. Support was only added recently though, so if there are issues, the server log would be helpful.

@rick-github commented on GitHub (Nov 2, 2025): Ollama should work with CUDA 13. Support was only added recently though, so if there are issues, the server log would be helpful.
Author
Owner

@Savin1play140 commented on GitHub (Nov 2, 2025):

Ollama should work with CUDA 13. Support was only added recently though, so if there are issues, the server log would be helpful.

That's not the problem, I think the CUDA 13 won't work with my gtx 750.

@Savin1play140 commented on GitHub (Nov 2, 2025): > Ollama should work with CUDA 13. Support was only added recently though, so if there are issues, the server log would be helpful. That's not the problem, I think the CUDA 13 won't work with my gtx 750.
Author
Owner

@rick-github commented on GitHub (Nov 2, 2025):

You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12.

@rick-github commented on GitHub (Nov 2, 2025): You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12.
Author
Owner

@Savin1play140 commented on GitHub (Nov 2, 2025):

You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12.

Will ollama work correctly if I replace CUDA 13, which installs Linux by default, with CUDA 12.9?

@Savin1play140 commented on GitHub (Nov 2, 2025): > You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12. Will ollama work correctly if I replace CUDA 13, which installs Linux by default, with CUDA 12.9?
Author
Owner

@Savin1play140 commented on GitHub (Nov 2, 2025):

You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12.
Will ollama work correctly if I replace CUDA 13, which installs Linux by default, with CUDA 12.9?

@Savin1play140 commented on GitHub (Nov 2, 2025): > You are correct, CUDA 13 supports CC 7.5 and above, GTX 750 is CC 5.0 and so requires CUDA 12. Will ollama work correctly if I replace CUDA 13, which installs Linux by default, with CUDA 12.9?
Author
Owner

@rick-github commented on GitHub (Nov 2, 2025):

Yes.

@rick-github commented on GitHub (Nov 2, 2025): Yes.
Author
Owner

@Savin1play140 commented on GitHub (Nov 2, 2025):

Yes.

Okay, thanks you.

@Savin1play140 commented on GitHub (Nov 2, 2025): > Yes. Okay, thanks you.
Author
Owner

@pdevine commented on GitHub (Nov 4, 2025):

I think this is answered? I'm going to go ahead and close the issue, but can reopen if it's still a problem.

@pdevine commented on GitHub (Nov 4, 2025): I think this is answered? I'm going to go ahead and close the issue, but can reopen if it's still a problem.
Author
Owner

@Savin1play140 commented on GitHub (Nov 4, 2025):

server.log
In addition, ollama starts loading the processor by ~20-30% after a request to load a model

@Savin1play140 commented on GitHub (Nov 4, 2025): [server.log](https://github.com/user-attachments/files/23333737/server.log) In addition, ollama starts loading the processor by ~20-30% after a request to load a model
Author
Owner

@rick-github commented on GitHub (Nov 4, 2025):

The log doesn't show a model load. ollama does see the GPU:

time=2025-11-04T12:14:55.245+03:00 level=INFO source=types.go:42 msg="inference compute"
 id=GPU-45737e26-eef3-de24-eb43-691dc0d0f26b filtered_id="" library=CUDA compute=5.0 name=CUDA0
 description="NVIDIA GeForce GTX 750" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:01:00.0 type=discrete
 total="2.0 GiB" available="1.8 GiB"
@rick-github commented on GitHub (Nov 4, 2025): The log doesn't show a model load. ollama does see the GPU: ``` time=2025-11-04T12:14:55.245+03:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-45737e26-eef3-de24-eb43-691dc0d0f26b filtered_id="" library=CUDA compute=5.0 name=CUDA0 description="NVIDIA GeForce GTX 750" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:01:00.0 type=discrete total="2.0 GiB" available="1.8 GiB" ```
Author
Owner

@Savin1play140 commented on GitHub (Nov 5, 2025):

The log doesn't show a model load. ollama does see the GPU:

time=2025-11-04T12:14:55.245+03:00 level=INFO source=types.go:42 msg="inference compute"
 id=GPU-45737e26-eef3-de24-eb43-691dc0d0f26b filtered_id="" library=CUDA compute=5.0 name=CUDA0
 description="NVIDIA GeForce GTX 750" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:01:00.0 type=discrete
 total="2.0 GiB" available="1.8 GiB"

yes, that's right, but the problem still exists

@Savin1play140 commented on GitHub (Nov 5, 2025): > The log doesn't show a model load. ollama does see the GPU: > ``` > time=2025-11-04T12:14:55.245+03:00 level=INFO source=types.go:42 msg="inference compute" > id=GPU-45737e26-eef3-de24-eb43-691dc0d0f26b filtered_id="" library=CUDA compute=5.0 name=CUDA0 > description="NVIDIA GeForce GTX 750" libdirs=ollama,cuda_v12 driver=12.9 pci_id=0000:01:00.0 type=discrete > total="2.0 GiB" available="1.8 GiB" > ``` yes, that's right, but the problem still exists
Author
Owner

@rick-github commented on GitHub (Nov 5, 2025):

And if you add a log that shows a model load, then the problem could be diagnosed.

@rick-github commented on GitHub (Nov 5, 2025): And if you add a log that shows a model load, then the problem could be diagnosed.
Author
Owner

@Savin1play140 commented on GitHub (Nov 5, 2025):

And if you add a log that shows a model load, then the problem could be diagnosed.

how i can do it?

@Savin1play140 commented on GitHub (Nov 5, 2025): > And if you add a log that shows a model load, then the problem could be diagnosed. how i can do it?
Author
Owner

@rick-github commented on GitHub (Nov 5, 2025):

  1. Load a model: ollama run qwen3:0.6b
  2. Add the log to this issue.
@rick-github commented on GitHub (Nov 5, 2025): 1. Load a model: `ollama run qwen3:0.6b` 2. Add the log to this issue.
Author
Owner

@Savin1play140 commented on GitHub (Nov 5, 2025):

  1. Load a model: ollama run qwen3:0.6b
  2. Add the log to this issue.

I already did this, judging by the sides, the model doesn't even try to load

@Savin1play140 commented on GitHub (Nov 5, 2025): > 1. Load a model: `ollama run qwen3:0.6b` > 2. Add the log to this issue. I already did this, judging by the sides, the model doesn't even try to load
Author
Owner

@rick-github commented on GitHub (Nov 5, 2025):

What was the response from ollama run qwen3:0.6b? That is, what was displayed on the terminal after running the command?

@rick-github commented on GitHub (Nov 5, 2025): What was the response from `ollama run qwen3:0.6b`? That is, what was displayed on the terminal after running the command?
Author
Owner

@Savin1play140 commented on GitHub (Nov 5, 2025):

What was the response from ollama run qwen3:0.6b? That is, what was displayed on the terminal after running the command?

eternal loading

@Savin1play140 commented on GitHub (Nov 5, 2025): > What was the response from `ollama run qwen3:0.6b`? That is, what was displayed on the terminal after running the command? eternal loading
Author
Owner

@rick-github commented on GitHub (Nov 5, 2025):

Could be #12699. Post the log.

@rick-github commented on GitHub (Nov 5, 2025): Could be #12699. Post the log.
Author
Owner

@Savin1play140 commented on GitHub (Nov 7, 2025):

The problem was solved in 0.12.11, thanks.

@Savin1play140 commented on GitHub (Nov 7, 2025): The problem was solved in 0.12.11, thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#8567