[GH-ISSUE #5458] Enable Row Split Support #3414

Open
opened 2026-04-12 14:03:22 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @datacrystals on GitHub (Jul 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5458

For multi-gpu setups, enabling row-split can have very significant performance improvements. On my machine with 3xP40, I was able to get a speedup from ~3t/s to ~10t/s.

Unfortunately, it doesn't look like there's any way to get this flag (-sm row) down to llama.cpp, which should otherwise support this.

Wondering if it's possible to either add a way to pass along flags to llama.cpp, or to have an option which enables this flag. Given the massive performance improvement that it brings though, it seems like this should be possible, and not very much work to implement.

I saw that passing along flags was discussed here, but was unfortunately not implemented.

Originally created by @datacrystals on GitHub (Jul 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5458 For multi-gpu setups, enabling row-split can have very significant performance improvements. On my machine with 3xP40, I was able to get a speedup from ~3t/s to ~10t/s. Unfortunately, it doesn't look like there's any way to get this flag (`-sm row`) down to llama.cpp, which should otherwise support this. Wondering if it's possible to either add a way to pass along flags to llama.cpp, or to have an option which enables this flag. Given the massive performance improvement that it brings though, it seems like this should be possible, and not very much work to implement. I saw that passing along flags was discussed [here](https://github.com/ollama/ollama/pull/4120#issuecomment-2094747527), but was unfortunately not implemented.
GiteaMirror added the feature request label 2026-04-12 14:03:22 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3414