[GH-ISSUE #1046] Add flag to force CPU only (instead of only autodetecting based on OS) #62547

Closed
opened 2026-05-03 09:32:22 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @joake on GitHub (Nov 8, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1046

Requesting a build flag to only use the CPU with ollama, not the GPU.

Users on MacOS models without support for Metal can only run ollama on the CPU. Currently in llama.go the function NumGPU defaults to returning 1 (default enable metal on all MacOS) and the function chooseRunners will add metal to the runners by default on all "darwin" systems.

This can lead to the error:

ggml_metal_init: allocating
ggml_metal_init: found device: Intel(R) UHD Graphics 630
ggml_metal_init: found device: AMD Radeon Pro 5500M
ggml_metal_init: picking default device: AMD Radeon Pro 5500M
ggml_metal_init: default.metallib not found, loading from source
2023/11/08 16:22:47 llama.go:399: signal: segmentation fault
2023/11/08 16:22:47 llama.go:407: error starting llama runner: llama runner process has terminated
2023/11/08 16:22:47 llama.go:473: llama runner stopped successfully

disabling metal by returning 0 in the NumGPU and removing metal from the chooseRunners function (by changing darwin to narwid for example) will circumvent this issue and run in the CPU only.

Originally created by @joake on GitHub (Nov 8, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1046 Requesting a build flag to only use the CPU with ollama, not the GPU. Users on MacOS models without support for Metal can only run ollama on the CPU. Currently in llama.go the function NumGPU defaults to returning 1 (default enable metal on all MacOS) and the function chooseRunners will add metal to the runners by default on all "darwin" systems. This can lead to the error: ```sh ggml_metal_init: allocating ggml_metal_init: found device: Intel(R) UHD Graphics 630 ggml_metal_init: found device: AMD Radeon Pro 5500M ggml_metal_init: picking default device: AMD Radeon Pro 5500M ggml_metal_init: default.metallib not found, loading from source 2023/11/08 16:22:47 llama.go:399: signal: segmentation fault 2023/11/08 16:22:47 llama.go:407: error starting llama runner: llama runner process has terminated 2023/11/08 16:22:47 llama.go:473: llama runner stopped successfully ``` disabling metal by returning 0 in the NumGPU and removing metal from the chooseRunners function (by changing darwin to narwid for example) will circumvent this issue and run in the CPU only.
Author
Owner

@DnzzL commented on GitHub (Nov 9, 2023):

I have had the same issue.
I think you can override the env variable CUDA_VISIBLE_DEVICES

For example:
CUDA_VISIBLE_DEVICES="" ollama create ... is working for me

<!-- gh-comment-id:1803468219 --> @DnzzL commented on GitHub (Nov 9, 2023): I have had the same issue. I think you can override the env variable `CUDA_VISIBLE_DEVICES` For example: ```CUDA_VISIBLE_DEVICES="" ollama create ...``` is working for me
Author
Owner

@mongolu commented on GitHub (Dec 27, 2023):

Well, it doesn't quite work in docker.
Since the go build was already done , it uses it even if you afterwards set CUDA_VISIBLE_DEVICES="".
I also tried creating a model from model file and put num_gpus=0, still uses GPUs.
The only way I found is to recompile ollama, making sure "he" doesn't find CUDA library at compilation time.

<!-- gh-comment-id:1870245688 --> @mongolu commented on GitHub (Dec 27, 2023): Well, it doesn't quite work in docker. Since the go build was already done , it uses it even if you afterwards set CUDA_VISIBLE_DEVICES="". I also tried creating a model from model file and put num_gpus=0, still uses GPUs. The only way I found is to recompile ollama, making sure "he" doesn't find CUDA library at compilation time.
Author
Owner

@mongolu commented on GitHub (Jan 2, 2024):

Happy New Year!

Also, it will be a lot more useful to be a runtime flag.
Please🙂
It happens frequently for me to receive CUDA error OOM, especially when having longer context.
And in cases like this I want to start ollama only on CPU, even it'll be a lot slower.

<!-- gh-comment-id:1873719336 --> @mongolu commented on GitHub (Jan 2, 2024): Happy New Year! Also, it will be a lot more useful to be a runtime flag. Please🙂 It happens frequently for me to receive CUDA error OOM, especially when having longer context. And in cases like this I want to start ollama only on CPU, even it'll be a lot slower.
Author
Owner

@jmorganca commented on GitHub (Jan 14, 2024):

Hi folks, thanks for the issue – this can be done today via the api with num_gpu option or via the cli with /set parameter num_gpu 0.

As of version 0.1.20, this will no longer use the GPU at all. Let me know if that solves the issue, that said let me know if I should keep this open!

<!-- gh-comment-id:1891085341 --> @jmorganca commented on GitHub (Jan 14, 2024): Hi folks, thanks for the issue – this can be done today via the api with `num_gpu` option or via the cli with `/set parameter num_gpu 0`. As of version 0.1.20, this will no longer use the GPU at all. Let me know if that solves the issue, that said let me know if I should keep this open!
Author
Owner

@jmorganca commented on GitHub (Jan 14, 2024):

@mongolu while there are a few issues still open for "out of memory" with CUDA, many should be fixed. Let me know if you're still seeing those 😊

<!-- gh-comment-id:1891085935 --> @jmorganca commented on GitHub (Jan 14, 2024): @mongolu while there are a few issues still open for "out of memory" with CUDA, many should be fixed. Let me know if you're still seeing those 😊
Author
Owner

@benbot commented on GitHub (Jan 30, 2024):

... via the cli with /set parameter num_gpu 0.

Sorry, could you give a quick example of how to use that? I'm not seeing any cli command that takes any set flags or anything like that

<!-- gh-comment-id:1916069375 --> @benbot commented on GitHub (Jan 30, 2024): > ... via the cli with `/set parameter num_gpu 0`. Sorry, could you give a quick example of how to use that? I'm not seeing any cli command that takes any `set` flags or anything like that
Author
Owner

@nonno-cicala commented on GitHub (Jan 30, 2024):

... via the cli with /set parameter num_gpu 0.

Sorry, could you give a quick example of how to use that? I'm not seeing any cli command that takes any set flags or anything like that

You need to be able to start the cli first and then set the parameter inside the repl, instead of a prompt

<!-- gh-comment-id:1916365259 --> @nonno-cicala commented on GitHub (Jan 30, 2024): > > ... via the cli with `/set parameter num_gpu 0`. > > Sorry, could you give a quick example of how to use that? I'm not seeing any cli command that takes any `set` flags or anything like that You need to be able to start the cli first and then set the parameter inside the repl, instead of a prompt
Author
Owner

@benbot commented on GitHub (Jan 30, 2024):

Wait, ollama has a repl?!? 😅

this is news to me. How do i launch it?

<!-- gh-comment-id:1917767880 --> @benbot commented on GitHub (Jan 30, 2024): Wait, ollama has a repl?!? :sweat_smile: this is news to me. How do i launch it?
Author
Owner

@nonno-cicala commented on GitHub (Jan 30, 2024):

Wait, ollama has a repl?!? 😅

this is news to me. How do i launch it?

Kind of, when you run for example ollama run llama2 it waits for a prompt, then you write one, it Reads the prompt, Evaluates it, Prints the response and the Loop repeats 😁

<!-- gh-comment-id:1917908292 --> @nonno-cicala commented on GitHub (Jan 30, 2024): > Wait, ollama has a repl?!? 😅 > > this is news to me. How do i launch it? Kind of, when you run for example `ollama run llama2` it waits for a prompt, then you write one, it Reads the prompt, Evaluates it, Prints the response and the Loop repeats :grin:
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62547