[GH-ISSUE #920] [Starcoder:7b] Not using CUDA #78125

Closed
opened 2026-05-08 22:01:01 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ManuLinares on GitHub (Oct 26, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/920

Originally assigned to: @BruceMacD on GitHub.

# ollama run starcoder:7b prompt

...
llm_load_tensors: VRAM used: 3968.42 MB
..GGML_ASSERT: /build/ollama/src/ollama-cuda/llm/llama.cpp/gguf/ggml-cuda.cu:6115: false
2023/10/26 16:09:13 llama.go:378: signal: aborted (core dumped)
2023/10/26 16:09:13 llama.go:386: error starting llama runner: llama runner process has terminated
2023/10/26 16:09:13 llama.go:452: llama runner stopped successfully
2023/10/26 16:09:13 llama.go:363: starting llama runner
2023/10/26 16:09:13 llama.go:421: waiting for llama runner to start responding
{"timestamp":1698347353,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}
...

Other models work fine:

example with ollama run mistral:instruct:

...
2023/10/26 16:12:55 llama.go:252: 7456 MB VRAM available, loading up to 55 GPU layers
2023/10/26 16:12:55 llama.go:363: starting llama runner
2023/10/26 16:12:55 llama.go:421: waiting for llama runner to start responding
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3070, compute capability 8.6
...
Originally created by @ManuLinares on GitHub (Oct 26, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/920 Originally assigned to: @BruceMacD on GitHub. `# ollama run starcoder:7b prompt` ``` ... llm_load_tensors: VRAM used: 3968.42 MB ..GGML_ASSERT: /build/ollama/src/ollama-cuda/llm/llama.cpp/gguf/ggml-cuda.cu:6115: false 2023/10/26 16:09:13 llama.go:378: signal: aborted (core dumped) 2023/10/26 16:09:13 llama.go:386: error starting llama runner: llama runner process has terminated 2023/10/26 16:09:13 llama.go:452: llama runner stopped successfully 2023/10/26 16:09:13 llama.go:363: starting llama runner 2023/10/26 16:09:13 llama.go:421: waiting for llama runner to start responding {"timestamp":1698347353,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} ... ``` Other models work fine: example with **ollama run mistral:instruct**: ``` ... 2023/10/26 16:12:55 llama.go:252: 7456 MB VRAM available, loading up to 55 GPU layers 2023/10/26 16:12:55 llama.go:363: starting llama runner 2023/10/26 16:12:55 llama.go:421: waiting for llama runner to start responding ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3070, compute capability 8.6 ... ```
GiteaMirror added the bug label 2026-05-08 22:01:01 -05:00
Author
Owner

@rhokstar commented on GitHub (Oct 30, 2023):

@ManuLinares @jmorganca Thanks for reporting! This bug was driving me crazy! I have the same problem as well. I'm running 2 GPUs: 1080 GTX and RTX A6000. When I prompt Star Coder, my CPU is being used. But when I run Mistral, my A6000 is working (I specified this through nvidia-smi).

<!-- gh-comment-id:1785415814 --> @rhokstar commented on GitHub (Oct 30, 2023): @ManuLinares @jmorganca Thanks for reporting! This bug was driving me crazy! I have the same problem as well. I'm running 2 GPUs: 1080 GTX and RTX A6000. When I prompt Star Coder, my CPU is being used. But when I run Mistral, my A6000 is working (I specified this through nvidia-smi).
Author
Owner

@mxyng commented on GitHub (Nov 21, 2023):

Should be fixed with #1224

<!-- gh-comment-id:1821635001 --> @mxyng commented on GitHub (Nov 21, 2023): Should be fixed with #1224
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#78125