[PR #5527] Add Environment Variable For Row Split #43066

Open
opened 2026-04-24 22:45:56 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/5527
Author: @datacrystals
Created: 7/7/2024
Status: 🔄 Open

Base: mainHead: main


📝 Commits (1)

  • e25faca Add Environment Variable For Row Split

📊 Changes

2 files changed (+16 additions, -0 deletions)

View changed files

📝 envconfig/config.go (+12 -0)
📝 llm/server.go (+4 -0)

📄 Description

As discussed in #5458, there does not appear to be a way to enable row-split rather than layer splitting, which on older multi-gpu setups seems to result in a 40-70% performance improvement.

I tested this on 3xP40 24GB GPUs running Ubuntu Server 22.04, and observed about a 70% improvement in throughput, but it does seem to vary model to model.

I've gone ahead and added it with a really simple change (added an environment variable, which when set to 1 enables row split rather than layer splitting) - please let me know if any edits are needed, I'd be happy to go and make any changes needed.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/5527 **Author:** [@datacrystals](https://github.com/datacrystals) **Created:** 7/7/2024 **Status:** 🔄 Open **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (1) - [`e25faca`](https://github.com/ollama/ollama/commit/e25faca8797828d5203f4e35df545a154e649eab) Add Environment Variable For Row Split ### 📊 Changes **2 files changed** (+16 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `envconfig/config.go` (+12 -0) 📝 `llm/server.go` (+4 -0) </details> ### 📄 Description As discussed in #5458, there does not appear to be a way to enable row-split rather than layer splitting, which on older multi-gpu setups seems to result in a 40-70% performance improvement. I tested this on 3xP40 24GB GPUs running Ubuntu Server 22.04, and observed about a 70% improvement in throughput, but it does seem to vary model to model. I've gone ahead and added it with a really simple change (added an environment variable, which when set to 1 enables row split rather than layer splitting) - please let me know if any edits are needed, I'd be happy to go and make any changes needed. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 22:45:56 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#43066