[PR #13570] [CLOSED] Vendor sync to b7735 #19546

Closed
opened 2026-04-16 07:10:24 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13570
Author: @inforithmics
Created: 12/26/2025
Status: Closed

Base: mainHead: UpdateParamsFit


📝 Commits (10+)

📊 Changes

199 files changed (+13262 additions, -3007 deletions)

View changed files

📝 Makefile.sync (+1 -1)
📝 llama/build-info.cpp (+1 -1)
📝 llama/llama.cpp/common/common.cpp (+45 -25)
📝 llama/llama.cpp/common/common.h (+28 -9)
📝 llama/llama.cpp/common/sampling.cpp (+109 -51)
📝 llama/llama.cpp/common/sampling.h (+9 -4)
📝 llama/llama.cpp/include/llama.h (+105 -14)
📝 llama/llama.cpp/src/llama-adapter.cpp (+12 -3)
📝 llama/llama.cpp/src/llama-adapter.h (+7 -1)
📝 llama/llama.cpp/src/llama-arch.cpp (+114 -1)
📝 llama/llama.cpp/src/llama-arch.h (+9 -0)
📝 llama/llama.cpp/src/llama-chat.cpp (+31 -0)
📝 llama/llama.cpp/src/llama-chat.h (+2 -0)
📝 llama/llama.cpp/src/llama-context.cpp (+637 -48)
📝 llama/llama.cpp/src/llama-context.h (+43 -1)
📝 llama/llama.cpp/src/llama-grammar.cpp (+40 -13)
📝 llama/llama.cpp/src/llama-grammar.h (+2 -0)
📝 llama/llama.cpp/src/llama-graph.cpp (+204 -48)
📝 llama/llama.cpp/src/llama-graph.h (+71 -6)
📝 llama/llama.cpp/src/llama-hparams.cpp (+4 -0)

...and 80 more files

📄 Description

Cuda / HIP: Bugfixes and Performance improvements
Vulkan: Bugfixes and Performance Improvements

Related pull requests:

Possible patch pull requests:

Things done:

  • had to adjust the 0004-solar-pro.patch
  • had to adjust the 0009-remove-amx.patch
  • had to adjust the 0018-ggml-Add-batch-size-hint.patch
  • had to adjust the 0020-ggml-No-alloc-mode.patch
  • had to adjust the 0021-decode-disable-output_all.patch
  • had to adjust the 0024-GPU-discovery-enhancements.patch
  • had to adjust sampling_ext.cpp (Additinal parameter in llama_model_loader)

Performance:
My not so scientific measurement method shows some major Token Generation improvements in Vulkan on AMD iGPU.

for example

qwen3:30B
22.03 -> 24.78 TG/s

gpt-oss:20B
13.74 -> 17.59 TG/s

Fun fact:
The vulkan backend in this pull request is faster than the hip backend on main.

main hip on 7900 xtx:
gpt-oss:20B 126 TG/s

pull request on 7900 xtx:
gpt-oss:20B 143 TG/S

Improvements:
Newer Qwen 3 Next gguf support (format was changed for better performance)


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13570 **Author:** [@inforithmics](https://github.com/inforithmics) **Created:** 12/26/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `UpdateParamsFit` --- ### 📝 Commits (10+) - [`c7d1f25`](https://github.com/ollama/ollama/commit/c7d1f258aa48b64ff38ffaa36f96fe46b183b42c) update patches - [`6ed31ab`](https://github.com/ollama/ollama/commit/6ed31ab21c055e7d6cb8c06ab7bcecad09ac4222) sync code - [`8a18eda`](https://github.com/ollama/ollama/commit/8a18eda43bf2951df086e70dd8522935945ac798) Update to b7548 - [`ee190f1`](https://github.com/ollama/ollama/commit/ee190f10c90aa55e4231a90bee5cb3a9f2468d22) Update to b7549 - [`dfe3d70`](https://github.com/ollama/ollama/commit/dfe3d7063673a8c8218750a07f1d986806e26add) Update to b7600 - [`25b43f8`](https://github.com/ollama/ollama/commit/25b43f8bb0dd5392c367a9b2b1fecd77e697cda8) Update to b7609 - [`47a8e00`](https://github.com/ollama/ollama/commit/47a8e006867f4b849946f1539534ba43261886f8) Update to b7616 - [`7f25eb0`](https://github.com/ollama/ollama/commit/7f25eb00389e999ac78c51b00db342b79941616f) Update to B7618 - [`777b6ce`](https://github.com/ollama/ollama/commit/777b6ce146003222f86c1436639749dc85c53d86) Fix ggml: export Gpu UUIds - [`d3403e5`](https://github.com/ollama/ollama/commit/d3403e5c12a1a7cdb129df1d58def4ec6a22e890) Fix 0018 ggml add batch sizer ### 📊 Changes **199 files changed** (+13262 additions, -3007 deletions) <details> <summary>View changed files</summary> 📝 `Makefile.sync` (+1 -1) 📝 `llama/build-info.cpp` (+1 -1) 📝 `llama/llama.cpp/common/common.cpp` (+45 -25) 📝 `llama/llama.cpp/common/common.h` (+28 -9) 📝 `llama/llama.cpp/common/sampling.cpp` (+109 -51) 📝 `llama/llama.cpp/common/sampling.h` (+9 -4) 📝 `llama/llama.cpp/include/llama.h` (+105 -14) 📝 `llama/llama.cpp/src/llama-adapter.cpp` (+12 -3) 📝 `llama/llama.cpp/src/llama-adapter.h` (+7 -1) 📝 `llama/llama.cpp/src/llama-arch.cpp` (+114 -1) 📝 `llama/llama.cpp/src/llama-arch.h` (+9 -0) 📝 `llama/llama.cpp/src/llama-chat.cpp` (+31 -0) 📝 `llama/llama.cpp/src/llama-chat.h` (+2 -0) 📝 `llama/llama.cpp/src/llama-context.cpp` (+637 -48) 📝 `llama/llama.cpp/src/llama-context.h` (+43 -1) 📝 `llama/llama.cpp/src/llama-grammar.cpp` (+40 -13) 📝 `llama/llama.cpp/src/llama-grammar.h` (+2 -0) 📝 `llama/llama.cpp/src/llama-graph.cpp` (+204 -48) 📝 `llama/llama.cpp/src/llama-graph.h` (+71 -6) 📝 `llama/llama.cpp/src/llama-hparams.cpp` (+4 -0) _...and 80 more files_ </details> ### 📄 Description Cuda / HIP: Bugfixes and Performance improvements Vulkan: Bugfixes and Performance Improvements Related pull requests: - https://github.com/ollama/ollama/pull/13546 - https://github.com/ollama/ollama/pull/13597 Possible patch pull requests: - [x] https://github.com/ggml-org/llama.cpp/pull/18467 Things to do: - [x] fix merge - [x] Patches clean up - [ ] Investigate qwen3 next failures. Things done: - had to adjust the 0004-solar-pro.patch - had to adjust the 0009-remove-amx.patch - had to adjust the 0018-ggml-Add-batch-size-hint.patch - had to adjust the 0020-ggml-No-alloc-mode.patch - had to adjust the 0021-decode-disable-output_all.patch - had to adjust the 0024-GPU-discovery-enhancements.patch - had to adjust sampling_ext.cpp (Additinal parameter in llama_model_loader) Performance: My not so scientific measurement method shows some major Token Generation improvements in Vulkan on AMD iGPU. for example qwen3:30B 22.03 -> 24.78 TG/s gpt-oss:20B 13.74 -> 17.59 TG/s Fun fact: The vulkan backend in this pull request is faster than the hip backend on main. main hip on 7900 xtx: gpt-oss:20B 126 TG/s pull request on 7900 xtx: gpt-oss:20B 143 TG/S Improvements: Newer Qwen 3 Next gguf support (format was changed for better performance) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 07:10:24 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#19546