[GH-ISSUE #1954] Support GPU A500 #26888

Closed
opened 2026-04-22 03:36:03 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @aemonge on GitHub (Jan 12, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1954

Can't get model tu run on GPU:

Fri Jan 12 16:22:20 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A500 Laptop GPU     Off | 00000000:03:00.0 Off |                  N/A |
| N/A   53C    P8               4W /  20W |      7MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1404      G   /usr/lib/Xorg                                 4MiB |
+---------------------------------------------------------------------------------------+

I'm on arch and installed via pacman -S ollama

Originally created by @aemonge on GitHub (Jan 12, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1954 Can't get model tu run on GPU: ``` Fri Jan 12 16:22:20 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA RTX A500 Laptop GPU Off | 00000000:03:00.0 Off | N/A | | N/A 53C P8 4W / 20W | 7MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1404 G /usr/lib/Xorg 4MiB | +---------------------------------------------------------------------------------------+ ``` I'm on arch and installed via `pacman -S ollama`
Author
Owner

@dcasota commented on GitHub (Jan 12, 2024):

same result, default settings. gpu2 is not used. workload goes to cpu.
Setup:
gpu0: Intel Iris Xe graphics
gpu1 (offline): Nvidia RTX 4070
gpu2: Nvidia RTX A500

2024/01/12 16:51:55 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama1132008292/cuda/libext_server.so
2024/01/12 16:51:55 ext_server_common.go:136: Initializing internal llama server
⠹ ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA RTX A500 Embedded GPU, compute capability 8.6
<!-- gh-comment-id:1889563152 --> @dcasota commented on GitHub (Jan 12, 2024): same result, default settings. gpu2 is not used. workload goes to cpu. Setup: gpu0: Intel Iris Xe graphics gpu1 (offline): Nvidia RTX 4070 gpu2: Nvidia RTX A500 ``` 2024/01/12 16:51:55 shim_ext_server.go:92: Loading Dynamic Shim llm server: /tmp/ollama1132008292/cuda/libext_server.so 2024/01/12 16:51:55 ext_server_common.go:136: Initializing internal llama server ⠹ ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA RTX A500 Embedded GPU, compute capability 8.6 ```
Author
Owner

@Crystal4276 commented on GitHub (Jan 14, 2024):

@xyproto

Same issue with an RTX 2080 -> no utilization of GPU (vram usage or gpu load)
Driver Version: 545.29.06 CUDA Version: 12.3

2 extra/ollama-cuda 0.1.20-2 [0 B 586.42 MiB] [Installed]
    Create, run and share large language models (LLMs) with CUDA
2024/01/14 21:55:23 shim_ext_server.go:142: Dynamic LLM variants [cuda]
2024/01/14 21:55:23 gpu.go:88: Detecting GPU type
2024/01/14 21:55:23 gpu.go:203: Searching for GPU management library libnvidia-ml.so
2024/01/14 21:55:23 gpu.go:248: Discovered GPU libraries: [/usr/lib/libnvidia-ml.so.545.29.06 /usr/lib32/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]
2024/01/14 21:55:23 gpu.go:94: Nvidia GPU detected
2024/01/14 21:55:23 gpu.go:135: CUDA Compute Capability detected: 7.5
<!-- gh-comment-id:1891039549 --> @Crystal4276 commented on GitHub (Jan 14, 2024): @xyproto Same issue with an RTX 2080 -> no utilization of GPU (vram usage or gpu load) Driver Version: 545.29.06 CUDA Version: 12.3 ``` 2 extra/ollama-cuda 0.1.20-2 [0 B 586.42 MiB] [Installed] Create, run and share large language models (LLMs) with CUDA ``` ``` 2024/01/14 21:55:23 shim_ext_server.go:142: Dynamic LLM variants [cuda] 2024/01/14 21:55:23 gpu.go:88: Detecting GPU type 2024/01/14 21:55:23 gpu.go:203: Searching for GPU management library libnvidia-ml.so 2024/01/14 21:55:23 gpu.go:248: Discovered GPU libraries: [/usr/lib/libnvidia-ml.so.545.29.06 /usr/lib32/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06] 2024/01/14 21:55:23 gpu.go:94: Nvidia GPU detected 2024/01/14 21:55:23 gpu.go:135: CUDA Compute Capability detected: 7.5 ```
Author
Owner

@Crystal4276 commented on GitHub (Jan 14, 2024):

Can't get model tu run on GPU:
I'm on arch and installed via pacman -S ollama

Have you tried with ollama-cuda ?

<!-- gh-comment-id:1891039960 --> @Crystal4276 commented on GitHub (Jan 14, 2024): > Can't get model tu run on GPU: > I'm on arch and installed via `pacman -S ollama` Have you tried with `ollama-cuda` ?
Author
Owner

@aemonge commented on GitHub (Jan 15, 2024):

Okey.....

Neither of both "worked" for me, but I did finally manage to make it work on GPU.

It's not an error, but a lack of warning or message. So I do hope this thread helps others :)

The models will fallback to the CPU when the parameters or model won't "fit" in the GPU.

Playing around with the /set parameters and checking the GPU help me find the sweet spot:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A500 Laptop GPU     Off | 00000000:03:00.0 Off |                  N/A |
| N/A   63C    P0              15W /  20W |   3551MiB /  4096MiB |     18%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1080      G   /usr/lib/Xorg                                 4MiB |
|    0   N/A  N/A     29084      C   /usr/bin/ollama                            3540MiB |
+---------------------------------------------------------------------------------------+
Mon Jan 15 09:03:58 2024       

Parameters for this specific GPU:

❯ ollama run mistral
>>> /set parameter num_gpu 25
Set parameter 'num_gpu' to '25'

>>> ping
 It seems like you're asking for a command related to computer networking. The "ping" command is used to test the 
reachability and response time of a networked host or device. When you type `ping` followed by a specific IP address or 
domain name in your terminal or command prompt, it sends packets to that address and waits for responses. The results will 
display information about the packets sent and received, including any packet loss and round-trip times. If you need 
further assistance with using the ping command or troubleshooting network issues, please let me know!

>>> /show parameters
User defined parameters:
num_gpu                        25

Model defined parameters:
stop                           [INST]
stop                           [/INST]
>>> /show modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM mistral:latest

FROM /var/lib/ollama/.ollama/models/blobs/sha256:e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730
TEMPLATE """[INST] {{ .System }} {{ .Prompt }} [/INST]"""
PARAMETER num_gpu 25
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"
<!-- gh-comment-id:1891521776 --> @aemonge commented on GitHub (Jan 15, 2024): Okey..... Neither of both "worked" for me, but I did finally manage to make it work on GPU. It's not an error, but a lack of warning or message. So I do hope this thread helps others :) The models will fallback to the CPU when the parameters or model won't "fit" in the GPU. Playing around with the `/set parameters` and checking the GPU help me find the sweet spot: ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA RTX A500 Laptop GPU Off | 00000000:03:00.0 Off | N/A | | N/A 63C P0 15W / 20W | 3551MiB / 4096MiB | 18% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1080 G /usr/lib/Xorg 4MiB | | 0 N/A N/A 29084 C /usr/bin/ollama 3540MiB | +---------------------------------------------------------------------------------------+ Mon Jan 15 09:03:58 2024 ``` Parameters for this specific GPU: ``` ❯ ollama run mistral >>> /set parameter num_gpu 25 Set parameter 'num_gpu' to '25' >>> ping It seems like you're asking for a command related to computer networking. The "ping" command is used to test the reachability and response time of a networked host or device. When you type `ping` followed by a specific IP address or domain name in your terminal or command prompt, it sends packets to that address and waits for responses. The results will display information about the packets sent and received, including any packet loss and round-trip times. If you need further assistance with using the ping command or troubleshooting network issues, please let me know! >>> /show parameters User defined parameters: num_gpu 25 Model defined parameters: stop [INST] stop [/INST] >>> /show modelfile # Modelfile generated by "ollama show" # To build a new Modelfile based on this one, replace the FROM line with: # FROM mistral:latest FROM /var/lib/ollama/.ollama/models/blobs/sha256:e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730 TEMPLATE """[INST] {{ .System }} {{ .Prompt }} [/INST]""" PARAMETER num_gpu 25 PARAMETER stop "[INST]" PARAMETER stop "[/INST]" ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26888