[GH-ISSUE #10172] tensor-split problem #68731

Closed
opened 2026-05-04 15:01:27 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @taikai-zz on GitHub (Apr 8, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10172

cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-e765360bed4cbcc829a85dcb7e7cfff4fd0461a3210b555bec6bd0d2faf27b75 --ctx-size 2048 --batch-size 512 --n-gpu-layers 66 --verbose --threads 32 --parallel 1 --tensor-split 8,8,8,8,8,8,7,7 --port 44979"

This is my debugging information. I found that ollama automatically starts with the “--tensor-split” parameter set. Can I manually or set this parameter in the configuration file? I couldn't find any relevant documents
For example, I currently have a 4G sized model and I want it to run on two graphics cards.

Originally created by @taikai-zz on GitHub (Apr 8, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10172 cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-e765360bed4cbcc829a85dcb7e7cfff4fd0461a3210b555bec6bd0d2faf27b75 --ctx-size 2048 --batch-size 512 --n-gpu-layers 66 --verbose --threads 32 --parallel 1 --tensor-split 8,8,8,8,8,8,7,7 --port 44979" This is my debugging information. I found that ollama automatically starts with the “--tensor-split” parameter set. Can I manually or set this parameter in the configuration file? I couldn't find any relevant documents For example, I currently have a 4G sized model and I want it to run on two graphics cards.
GiteaMirror added the feature request label 2026-05-04 15:01:27 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 8, 2025):

Currently you can't control GPU assignment on a per-model basis. If you want to use a subset of the available cards to run a model, you can start another server and bind specific cards to it by setting the CUDA_VISIBLE_DEVICES environment variable.

<!-- gh-comment-id:2784986895 --> @rick-github commented on GitHub (Apr 8, 2025): Currently you can't control GPU assignment on a per-model basis. If you want to use a subset of the available cards to run a model, you can start another server and bind specific cards to it by setting the `CUDA_VISIBLE_DEVICES` environment variable.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68731