[GH-ISSUE #1913] 0.1.19 no longer uses my nvidia cards #63139

Closed
opened 2026-05-03 12:16:52 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @skrew on GitHub (Jan 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1913

Originally assigned to: @jmorganca on GitHub.

worked on 0.1.18.

Logs from 0.1.19:

➜  ~ ollama serve
2024/01/10 22:35:20 images.go:808: total blobs: 5
2024/01/10 22:35:20 images.go:815: total unused blobs removed: 0
2024/01/10 22:35:20 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.19)
2024/01/10 22:35:21 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
2024/01/10 22:35:21 gpu.go:35: Detecting GPU type
2024/01/10 22:35:21 gpu.go:54: Nvidia GPU detected
2024/01/10 22:35:21 gpu.go:84: CUDA Compute Capability detected: 6.1
size 49625198848
filetype Q8_0
architecture llama
type 47B
name gguf
embd 4096
head 32
head_kv 8
gqa 4
2024/01/10 22:35:26 gpu.go:84: CUDA Compute Capability detected: 6.1
2024/01/10 22:35:26 llm.go:70: system memory bytes: 0
2024/01/10 22:35:26 llm.go:71: required model bytes: 49625198848
2024/01/10 22:35:26 llm.go:72: required kv bytes: 268435456
2024/01/10 22:35:26 llm.go:73: required alloc bytes: 178956970
2024/01/10 22:35:26 llm.go:74: required total bytes: 50072591274
2024/01/10 22:35:26 gpu.go:84: CUDA Compute Capability detected: 6.1
2024/01/10 22:35:26 llm.go:105: not enough vram available, falling back to CPU only
2024/01/10 22:35:26 ext_server_common.go:136: Initializing internal llama server

Logs from 0.1.18:

2024/01/10 22:39:02 images.go:834: total blobs: 5
2024/01/10 22:39:02 images.go:841: total unused blobs removed: 0
2024/01/10 22:39:02 routes.go:929: Listening on 127.0.0.1:11434 (version 0.1.18)
2024/01/10 22:39:02 shim_ext_server.go:142: Dynamic LLM variants [rocm cuda]
2024/01/10 22:39:02 gpu.go:34: Detecting GPU type
2024/01/10 22:39:02 gpu.go:53: Nvidia GPU detected
...
Lazy loading /tmp/ollama314200454/cuda/libext_server.so library
2024/01/10 22:39:06 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama314200454/cuda/libext_server.so
2024/01/10 22:39:06 gpu.go:146: 81110 MB VRAM available, loading up to 40 cuda GPU layers out of 32
2024/01/10 22:39:06 ext_server_common.go:143: Initializing internal llama server
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 10 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 1: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 2: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 3: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 4: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 5: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 6: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 7: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 8: NVIDIA GeForce GTX 1070, compute capability 6.1
  Device 9: NVIDIA GeForce GTX 1070, compute capability 6.1
llama_model_loader: loaded meta data with 26 key-value pairs and 995 tensors from (version GGUF V3 (latest))
...
llm_load_tensors: ggml ctx size =    0.38 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =  133.19 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: VRAM used: 47191.83 MiB
Originally created by @skrew on GitHub (Jan 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1913 Originally assigned to: @jmorganca on GitHub. worked on 0.1.18. Logs from 0.1.19: ``` ➜ ~ ollama serve 2024/01/10 22:35:20 images.go:808: total blobs: 5 2024/01/10 22:35:20 images.go:815: total unused blobs removed: 0 2024/01/10 22:35:20 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.19) 2024/01/10 22:35:21 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] 2024/01/10 22:35:21 gpu.go:35: Detecting GPU type 2024/01/10 22:35:21 gpu.go:54: Nvidia GPU detected 2024/01/10 22:35:21 gpu.go:84: CUDA Compute Capability detected: 6.1 size 49625198848 filetype Q8_0 architecture llama type 47B name gguf embd 4096 head 32 head_kv 8 gqa 4 2024/01/10 22:35:26 gpu.go:84: CUDA Compute Capability detected: 6.1 2024/01/10 22:35:26 llm.go:70: system memory bytes: 0 2024/01/10 22:35:26 llm.go:71: required model bytes: 49625198848 2024/01/10 22:35:26 llm.go:72: required kv bytes: 268435456 2024/01/10 22:35:26 llm.go:73: required alloc bytes: 178956970 2024/01/10 22:35:26 llm.go:74: required total bytes: 50072591274 2024/01/10 22:35:26 gpu.go:84: CUDA Compute Capability detected: 6.1 2024/01/10 22:35:26 llm.go:105: not enough vram available, falling back to CPU only 2024/01/10 22:35:26 ext_server_common.go:136: Initializing internal llama server ``` Logs from 0.1.18: ``` 2024/01/10 22:39:02 images.go:834: total blobs: 5 2024/01/10 22:39:02 images.go:841: total unused blobs removed: 0 2024/01/10 22:39:02 routes.go:929: Listening on 127.0.0.1:11434 (version 0.1.18) 2024/01/10 22:39:02 shim_ext_server.go:142: Dynamic LLM variants [rocm cuda] 2024/01/10 22:39:02 gpu.go:34: Detecting GPU type 2024/01/10 22:39:02 gpu.go:53: Nvidia GPU detected ... Lazy loading /tmp/ollama314200454/cuda/libext_server.so library 2024/01/10 22:39:06 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama314200454/cuda/libext_server.so 2024/01/10 22:39:06 gpu.go:146: 81110 MB VRAM available, loading up to 40 cuda GPU layers out of 32 2024/01/10 22:39:06 ext_server_common.go:143: Initializing internal llama server ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 10 CUDA devices: Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 1: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 2: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 3: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 4: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 5: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 6: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 7: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 8: NVIDIA GeForce GTX 1070, compute capability 6.1 Device 9: NVIDIA GeForce GTX 1070, compute capability 6.1 llama_model_loader: loaded meta data with 26 key-value pairs and 995 tensors from (version GGUF V3 (latest)) ... llm_load_tensors: ggml ctx size = 0.38 MiB llm_load_tensors: using CUDA for GPU acceleration llm_load_tensors: mem required = 133.19 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: VRAM used: 47191.83 MiB ```
GiteaMirror added the bug label 2026-05-03 12:16:52 -05:00
Author
Owner

@jmorganca commented on GitHub (Jan 10, 2024):

Sorry this happened and thanks for creating an issue. There's a bug with memory estimation with high GPU count, it will be fixed in an upcoming release.

In the meantime here's a script to easily install a previous version:

curl https://ollama.ai/install.sh | sed 's#https://ollama.ai/download#https://github.com/jmorganca/ollama/releases/download/v0.1.18#' | sh
<!-- gh-comment-id:1885882065 --> @jmorganca commented on GitHub (Jan 10, 2024): Sorry this happened and thanks for creating an issue. There's a bug with memory estimation with high GPU count, it will be fixed in an upcoming release. In the meantime here's a script to easily install a previous version: ``` curl https://ollama.ai/install.sh | sed 's#https://ollama.ai/download#https://github.com/jmorganca/ollama/releases/download/v0.1.18#' | sh ```
Author
Owner

@skrew commented on GitHub (Jan 12, 2024):

0.1.20 fixed the issue. Thanks

<!-- gh-comment-id:1888707144 --> @skrew commented on GitHub (Jan 12, 2024): 0.1.20 fixed the issue. Thanks
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63139