[PR #14134] [CLOSED] Revert revert vendor update (Vendor Update to b8353) #14531

Closed
opened 2026-04-13 00:57:00 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14134
Author: @inforithmics
Created: 2/7/2026
Status: Closed

Base: mainHead: RevertRevertVendorUpdate


📝 Commits (10+)

📊 Changes

422 files changed (+61222 additions, -14953 deletions)

View changed files

📝 .github/workflows/release.yaml (+3 -3)
📝 .github/workflows/test.yaml (+8 -4)
📝 CMakePresets.json (+16 -1)
📝 Dockerfile (+1 -1)
📝 Makefile.sync (+1 -1)
📝 app/ui/ui.go (+0 -1)
📝 integration/embed_test.go (+12 -2)
📝 integration/tools_test.go (+8 -3)
📝 llama/README.md (+17 -14)
📝 llama/build-info.cpp (+1 -1)
📝 llama/llama.cpp/.rsync-filter (+5 -0)
📝 llama/llama.cpp/LICENSE (+1 -1)
📝 llama/llama.cpp/common/common.cpp (+140 -163)
📝 llama/llama.cpp/common/common.h (+192 -86)
📝 llama/llama.cpp/common/json-schema-to-grammar.cpp (+87 -66)
llama/llama.cpp/common/reasoning-budget.cpp (+219 -0)
llama/llama.cpp/common/reasoning-budget.h (+41 -0)
📝 llama/llama.cpp/common/sampling.cpp (+172 -69)
📝 llama/llama.cpp/common/sampling.h (+9 -4)
llama/llama.cpp/common/unicode.cpp (+124 -0)

...and 80 more files

📄 Description

Patches adjusted:

  • 0004-solar-pro
  • 0014-ggml-Export-GPU-UUIDs
  • 0025-report-LoadLibrary-failures
  • 0026-interleave-multi-

Patches created:

  • 0032-Improve-ggml_backend_vk_get_device_pci_id

Patches removed:

  • 0005-fix-deepseek-desert-regex (Can be removed because of https://github.com/ggml-org/llama.cpp/pull/19565 I think)
  • 0032-ggml-enable-MLA-flash-attention-for-GLM-4.7-flash.patch (included in vendor sync)
  • 0033-ggml-metal-solve_tri.patch (included in vendor sync)

Additional files synced:
llama/llama.cpp/common/unicode.*

Based on Pull Request:
https://github.com/ollama/ollama/pull/13832 (Reverted Vendor Update)

Included Pull Requests:
https://github.com/ollama/ollama/pull/13546 (Vulkan Sdk Update)
https://github.com/ollama/ollama/pull/13597 (Use Top_k)
https://github.com/ollama/ollama/pull/14525 (Use MMap in Vulkan)

Major Things:

  • GATED_DELTA_NET Operator is now implemented for Vulkan,CUDA,ROCm,Metal Qwen3 Next Qwen3.5 Model implementations need to be adapted.

Known bugs:

  • Vulkan Rocm deduplication does not work.
    Amd Driver seem to announce not to support pci bus information but it can be get anyway

  • there seems to be some double allocation because qwen3 coder next (huggingface unsloth q4_k_m) shows 108GB of Memory usage instaed of around 51 GB (gguf size)
    Solution: set explicit context length for example OLLAMA_CONTEXT_LENGTH = 24000 or else the full context length is used of 265000 which uses a lot of memory

  • There seems to be a problem with Metal and Qwen3 Coder Unsloth gguf.
    failed to compile pipeline: base = 'kernel_unary_f32_f32_4', name = 'kernel_unary_f32_f32_4_o
    p=10_cnt=1'

  • Intel Vulkan crash whith large models
    Issue: https://github.com/ggml-org/llama.cpp/issues/19420
    PullRequest: https://github.com/ggml-org/llama.cpp/pull/20059

  • Gofmt Errors in mlx runner (Because more than 400 Files are c hanged in this pull request)

Fixes this:
https://github.com/ollama/ollama/issues/14045


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14134 **Author:** [@inforithmics](https://github.com/inforithmics) **Created:** 2/7/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `RevertRevertVendorUpdate` --- ### 📝 Commits (10+) - [`71e9f89`](https://github.com/ollama/ollama/commit/71e9f89f11af407a5a01555af7ebc58cb47dac89) Update Vulkan Sdk to 1.4.335.0 - [`adf49c3`](https://github.com/ollama/ollama/commit/adf49c3fde6a98a8c0d95737ec2c9ab31acba52c) Update Dockerfile - [`f5ec9a2`](https://github.com/ollama/ollama/commit/f5ec9a29ac888ceb9ad0dc25a65135e2ff94e6bb) Merge remote-tracking branch 'upstream/main' into VulkanUpdate - [`3d00d68`](https://github.com/ollama/ollama/commit/3d00d68344f233bdafb79eea4bab2c5692914fd8) Use top_k instead of argsort_top_k - [`19ef0ed`](https://github.com/ollama/ollama/commit/19ef0ed34200c57ff37c79cd28a8987a8ad7ba49) Revert Revert - [`4c3c575`](https://github.com/ollama/ollama/commit/4c3c575495afbc96dc94a1c8d2e51cc23fda2336) remove 0033 patch - [`0d088e3`](https://github.com/ollama/ollama/commit/0d088e3b75802e448a0603fddcbb70ca1a59056d) Update patches - [`d79962c`](https://github.com/ollama/ollama/commit/d79962cec9c2e562789aeb126453e3175a6ecdd8) Update b7929 add solve tri - [`4c36519`](https://github.com/ollama/ollama/commit/4c36519eb3e860beb89967288f2f2a76b3c50849) Update to b7936 - [`d3f5c80`](https://github.com/ollama/ollama/commit/d3f5c8083cadb90e70b66f332957c1e013a022dd) Update to b7942 ### 📊 Changes **422 files changed** (+61222 additions, -14953 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/release.yaml` (+3 -3) 📝 `.github/workflows/test.yaml` (+8 -4) 📝 `CMakePresets.json` (+16 -1) 📝 `Dockerfile` (+1 -1) 📝 `Makefile.sync` (+1 -1) 📝 `app/ui/ui.go` (+0 -1) 📝 `integration/embed_test.go` (+12 -2) 📝 `integration/tools_test.go` (+8 -3) 📝 `llama/README.md` (+17 -14) 📝 `llama/build-info.cpp` (+1 -1) 📝 `llama/llama.cpp/.rsync-filter` (+5 -0) 📝 `llama/llama.cpp/LICENSE` (+1 -1) 📝 `llama/llama.cpp/common/common.cpp` (+140 -163) 📝 `llama/llama.cpp/common/common.h` (+192 -86) 📝 `llama/llama.cpp/common/json-schema-to-grammar.cpp` (+87 -66) ➕ `llama/llama.cpp/common/reasoning-budget.cpp` (+219 -0) ➕ `llama/llama.cpp/common/reasoning-budget.h` (+41 -0) 📝 `llama/llama.cpp/common/sampling.cpp` (+172 -69) 📝 `llama/llama.cpp/common/sampling.h` (+9 -4) ➕ `llama/llama.cpp/common/unicode.cpp` (+124 -0) _...and 80 more files_ </details> ### 📄 Description Patches adjusted: - 0004-solar-pro - 0014-ggml-Export-GPU-UUIDs - 0025-report-LoadLibrary-failures - 0026-interleave-multi- Patches created: - 0032-Improve-ggml_backend_vk_get_device_pci_id Patches removed: - 0005-fix-deepseek-desert-regex (Can be removed because of https://github.com/ggml-org/llama.cpp/pull/19565 I think) - 0032-ggml-enable-MLA-flash-attention-for-GLM-4.7-flash.patch (included in vendor sync) - 0033-ggml-metal-solve_tri.patch (included in vendor sync) Additional files synced: llama/llama.cpp/common/unicode.* Based on Pull Request: https://github.com/ollama/ollama/pull/13832 (Reverted Vendor Update) Included Pull Requests: https://github.com/ollama/ollama/pull/13546 (Vulkan Sdk Update) https://github.com/ollama/ollama/pull/13597 (Use Top_k) https://github.com/ollama/ollama/pull/14525 (Use MMap in Vulkan) Major Things: - GATED_DELTA_NET Operator is now implemented for Vulkan,CUDA,ROCm,Metal Qwen3 Next Qwen3.5 Model implementations need to be adapted. Known bugs: - [x] Vulkan Rocm deduplication does not work. Amd Driver seem to announce not to support pci bus information but it can be get anyway - [x] there seems to be some double allocation because qwen3 coder next (huggingface unsloth q4_k_m) shows 108GB of Memory usage instaed of around 51 GB (gguf size) Solution: set explicit context length for example OLLAMA_CONTEXT_LENGTH = 24000 or else the full context length is used of 265000 which uses a lot of memory - [x] There seems to be a problem with Metal and Qwen3 Coder Unsloth gguf. failed to compile pipeline: base = 'kernel_unary_f32_f32_4', name = 'kernel_unary_f32_f32_4_o p=10_cnt=1' - [x] Intel Vulkan crash whith large models Issue: https://github.com/ggml-org/llama.cpp/issues/19420 PullRequest: https://github.com/ggml-org/llama.cpp/pull/20059 - [x] Gofmt Errors in mlx runner (Because more than 400 Files are c hanged in this pull request) Fixes this: https://github.com/ollama/ollama/issues/14045 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:57:00 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14531