[PR #3540] [CLOSED] Implement 'split_mode' and 'tensor_split' support in modelfiles #42439

Closed
opened 2026-04-24 22:12:38 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/3540
Author: @jukofyork
Created: 4/8/2024
Status: Closed

Base: mainHead: tensor_split


📝 Commits (4)

📊 Changes

4 files changed (+15 additions, -0 deletions)

View changed files

📝 api/types.go (+2 -0)
📝 docs/api.md (+2 -0)
📝 docs/modelfile.md (+3 -0)
📝 llm/server.go (+8 -0)

📄 Description

This adds support for the tensor_split and split_mode options in llama.cpp::server.

The split_mode option has three possible values, and from llama.cpp::server --help:

How to split the model across multiple GPUs, one of:

  • "layer": split layers and KV across GPUs (default).
  • "row": split rows across GPUs.
  • "none": use one GPU only.

It also changes the meaning of the main_gpu parameter:

The GPU to use for the model (with split_mode = "none") or for intermediate results and KV (with split_mode = "row").


To use:

git clone https://github.com/ollama/ollama
cd ollama
git pull origin pull/3540/head

Then compile as normal (you might want to edit the "0.0.0" version number in version/version.go before compiling if you use with OpenWebUI or it will think the version is below the minimum it requires).


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/3540 **Author:** [@jukofyork](https://github.com/jukofyork) **Created:** 4/8/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `tensor_split` --- ### 📝 Commits (4) - [`367eb5d`](https://github.com/ollama/ollama/commit/367eb5d35010d7368bbbb4a96810375aa7fe68b7) Update types.go - [`b1ab340`](https://github.com/ollama/ollama/commit/b1ab340f2be5fc1e27ad0f194674a7797f972951) Update api.md - [`673e1e4`](https://github.com/ollama/ollama/commit/673e1e4f5b51c724dfa446f0cdad03cb779c2f2b) Update modelfile.md - [`6fced7b`](https://github.com/ollama/ollama/commit/6fced7b70287bbbcd5caa2abe9f39dfac0216ffa) Update server.go ### 📊 Changes **4 files changed** (+15 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+2 -0) 📝 `docs/api.md` (+2 -0) 📝 `docs/modelfile.md` (+3 -0) 📝 `llm/server.go` (+8 -0) </details> ### 📄 Description This adds support for the `tensor_split` and `split_mode` options in `llama.cpp::server`. The `split_mode` option has three possible values, and from `llama.cpp::server --help`: > How to split the model across multiple GPUs, one of: > - "layer": split layers and KV across GPUs (default). > - "row": split rows across GPUs. > - "none": use one GPU only. It also changes the meaning of the `main_gpu` parameter: > The GPU to use for the model (with split_mode = "none") or for intermediate results and KV (with split_mode = "row"). --- To use: ``` git clone https://github.com/ollama/ollama cd ollama git pull origin pull/3540/head ``` Then compile as normal (you might want to edit the "0.0.0" version number in `version/version.go` before compiling if you use with OpenWebUI or it will think the version is below the minimum it requires). --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 22:12:38 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#42439