[GH-ISSUE #9765] Confirming if Ollama supports NVIDIA MPS #68440

Open
opened 2026-05-04 13:57:40 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Dream-Lantern on GitHub (Mar 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9765

Hello,
I have enabled MPS mode on my NVIDIA A30 GPU and attempted to start the Ollama service. When I checked using the nvidia-smi command, the type was not showing as "M+C". I know that MPS is working because it functions correctly when I start a training task. I'm not sure if this is due to an error in my operation or if Ollama currently does not support MPS mode.

I would appreciate any guidance on whether I made an error in my operations or if there are known limitations with Ollama and MPS mode.
Thank you very much for your time and assistance. I hope you have a wonderful day!
Best regards,

Here are my operation records and logs:
1.Enable MPS on NVIDIA A30 GPU:
export CUDA_VISIBLE_DEVICES=0 export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps/ mkdir -p $CUDA_MPS_PIPE_DIRECTORY sudo nvidia-smi -i 0 -c EXCLUSIVE_PROCESS sudo nvidia-cuda-mps-control -d

2.Start Ollama Service and Run model:
sudo env "CUDA_VISIBLE_DEVICES=0" "CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps" ollama serve
sudo env "CUDA_VISIBLE_DEVICES=0" "CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps" ollama run deepseek-r1-qwen:1.5b

3.Check GPU Status with nvidia-smi:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=============+==================+======================|
| 0 NVIDIA A30 On | 00000000:E1:00.0 Off | Off |
| N/A 78C P0 153W / 165W | 16701MiB / 24576MiB | 100% E. Process |
| | | Disabled |
+-----------------------------------------+----------------------+--------------------------+

+------------------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================|
| 0 N/A N/A 719636 C nvidia-cuda-mps-server 30MiB |
| 0 N/A N/A 726930 M+C python3.8 14622MiB |
| 0 N/A N/A 728422 C ...rs/cuda_v12_avx/ollama_llama_server 2040MiB |
+---------------------------------------------------------------------------------------------+

Logs:
1.mps_control.log
[2025-03-14 18:20:20.981 Control 718283] Accepting connection...
[2025-03-14 18:20:20.981 Control 718283] User did not send valid credentials
[2025-03-14 18:20:20.981 Control 718283] Accepting connection...
[2025-03-14 18:20:20.981 Control 718283] NEW CLIENT 735273 from user 0: Server already exists
[2025-03-14 18:20:21.315 Control 718283] Accepting connection...
[2025-03-14 18:20:21.316 Control 718283] NEW CLIENT 735273 from user 0: Server already exists

2.mps_server.log
[2025-03-14 18:20:20.981 Server 719636] Received new client request
[2025-03-14 18:20:20.981 Server 719636] Worker created
[2025-03-14 18:20:20.981 Server 719636] Creating worker thread
[2025-03-14 18:20:21.316 Server 719636] Received new client request
[2025-03-14 18:20:21.316 Server 719636] Worker created
[2025-03-14 18:20:21.316 Server 719636] Creating worker thread
[2025-03-14 18:20:21.316 Server 719636] Device NVIDIA A30 (uuid 0x47209a07-0x6074dcac-0x7bcb66f2-0x15661a98) is associated
[2025-03-14 18:20:21.316 Server 719636] Status of client {735273, 1} is ACTIVE
[2025-03-14 18:20:21.575 Server 719636] Receive command failed, assuming client exit
[2025-03-14 18:20:21.575 Server 719636] Client {735273, 1} exit
[2025-03-14 18:20:21.575 Server 719636] Client disconnected. Number of active client contexts is 1.

3.ollama_serve.log
2025/03/14 18:06:19 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"

Originally created by @Dream-Lantern on GitHub (Mar 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9765 Hello, I have enabled MPS mode on my NVIDIA A30 GPU and attempted to start the Ollama service. When I checked using the nvidia-smi command, the type was not showing as "M+C". I know that MPS is working because it functions correctly when I start a training task. I'm not sure if this is due to an error in my operation or if Ollama currently does not support MPS mode. I would appreciate any guidance on whether I made an error in my operations or if there are known limitations with Ollama and MPS mode. Thank you very much for your time and assistance. I hope you have a wonderful day! Best regards, Here are my operation records and logs: 1.Enable MPS on NVIDIA A30 GPU: `export CUDA_VISIBLE_DEVICES=0 export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps/ mkdir -p $CUDA_MPS_PIPE_DIRECTORY sudo nvidia-smi -i 0 -c EXCLUSIVE_PROCESS sudo nvidia-cuda-mps-control -d` 2.Start Ollama Service and Run model: `sudo env "CUDA_VISIBLE_DEVICES=0" "CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps" ollama serve` `sudo env "CUDA_VISIBLE_DEVICES=0" "CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps" ollama run deepseek-r1-qwen:1.5b` 3.Check GPU Status with nvidia-smi: +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=============+==================+======================| | 0 NVIDIA A30 On | 00000000:E1:00.0 Off | Off | | N/A 78C P0 153W / 165W | 16701MiB / 24576MiB | 100% E. Process | | | | Disabled | +-----------------------------------------+----------------------+--------------------------+ +------------------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================| | 0 N/A N/A 719636 C nvidia-cuda-mps-server 30MiB | | 0 N/A N/A 726930 M+C python3.8 14622MiB | | 0 N/A N/A 728422 C ...rs/cuda_v12_avx/ollama_llama_server 2040MiB | +---------------------------------------------------------------------------------------------+ Logs: 1.mps_control.log [2025-03-14 18:20:20.981 Control 718283] Accepting connection... [2025-03-14 18:20:20.981 Control 718283] User did not send valid credentials [2025-03-14 18:20:20.981 Control 718283] Accepting connection... [2025-03-14 18:20:20.981 Control 718283] NEW CLIENT 735273 from user 0: Server already exists [2025-03-14 18:20:21.315 Control 718283] Accepting connection... [2025-03-14 18:20:21.316 Control 718283] NEW CLIENT 735273 from user 0: Server already exists 2.mps_server.log [2025-03-14 18:20:20.981 Server 719636] Received new client request [2025-03-14 18:20:20.981 Server 719636] Worker created [2025-03-14 18:20:20.981 Server 719636] Creating worker thread [2025-03-14 18:20:21.316 Server 719636] Received new client request [2025-03-14 18:20:21.316 Server 719636] Worker created [2025-03-14 18:20:21.316 Server 719636] Creating worker thread [2025-03-14 18:20:21.316 Server 719636] Device NVIDIA A30 (uuid 0x47209a07-0x6074dcac-0x7bcb66f2-0x15661a98) is associated [2025-03-14 18:20:21.316 Server 719636] Status of client {735273, 1} is ACTIVE [2025-03-14 18:20:21.575 Server 719636] Receive command failed, assuming client exit [2025-03-14 18:20:21.575 Server 719636] Client {735273, 1} exit [2025-03-14 18:20:21.575 Server 719636] Client disconnected. Number of active client contexts is 1. 3.ollama_serve.log 2025/03/14 18:06:19 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
GiteaMirror added the feature request label 2026-05-04 13:57:40 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68440