[PR #14172] Add reranking support #19821

Open
opened 2026-04-16 07:17:38 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14172
Author: @manx98
Created: 2/9/2026
Status: 🔄 Open

Base: mainHead: feat-rerank


📝 Commits (10+)

  • 8f7eea3 feat: add llama rerank support
  • 5211d41 Merge branch 'refs/heads/main' into feat-rerank
  • 0977d76 feat: add ollama runner rerank support
  • 0bff10d feat: update modelfile.mdx for rerank
  • 3c5340c remove debug code
  • 374f001 fix template execute failed
  • 62e0d73 Merge branch 'main' into feat-rerank
  • c544d49 add rerank interface into mlxrunner
  • e02bcdb Update llama.cpp to b7591 to fix rerank on CUDA
  • b0414f3 Merge branch 'main' into feat-rerank

📊 Changes

119 files changed (+5721 additions, -1068 deletions)

View changed files

📝 Makefile.sync (+1 -1)
📝 api/types.go (+25 -0)
📝 convert/convert.go (+10 -0)
📝 docs/api.md (+90 -0)
📝 docs/modelfile.mdx (+2 -0)
📝 docs/template.mdx (+4 -0)
📝 fs/config.go (+2 -0)
📝 fs/ggml/ggml.go (+10 -0)
📝 llama/build-info.cpp (+1 -1)
📝 llama/llama.cpp/common/common.cpp (+24 -24)
📝 llama/llama.cpp/common/common.h (+7 -3)
📝 llama/llama.cpp/common/sampling.cpp (+51 -37)
📝 llama/llama.cpp/common/sampling.h (+6 -3)
📝 llama/llama.cpp/include/llama.h (+13 -4)
📝 llama/llama.cpp/src/llama-adapter.cpp (+12 -3)
📝 llama/llama.cpp/src/llama-adapter.h (+7 -1)
📝 llama/llama.cpp/src/llama-arch.cpp (+59 -1)
📝 llama/llama.cpp/src/llama-arch.h (+6 -0)
📝 llama/llama.cpp/src/llama-context.cpp (+22 -21)
📝 llama/llama.cpp/src/llama-hparams.h (+4 -3)

...and 80 more files

📄 Description

Implements reranking: #3368

Continuation of the re-ranking feature implementation based on #11328.

I referred to the method used in llama.cpp and implemented the re-ranking functionality in both ollamarunner and llamarunner. This implementation supports re-ranking models based on Qwen3 and BERT.

The output is stable when running on CPU. However, when running on GPU, the scores for some documents become NaN, causing the response to fail to return correctly. In llamarunner, this issue even leads to a crash.

The results are stable when using the latest version of llama-serve. I also tested with the current version of llama-server used in this project (ec98e2002), during which some scores were returned as null. Upon investigation, I found that this issue was fixed in version b7591 of llama-server.

test models:

  • qllama/bce-reranker-base_v1:q8_0
  • dengcao/Qwen3-Reranker-8B:q8_0
  • bge-reranker-v2-m3:latest

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14172 **Author:** [@manx98](https://github.com/manx98) **Created:** 2/9/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feat-rerank` --- ### 📝 Commits (10+) - [`8f7eea3`](https://github.com/ollama/ollama/commit/8f7eea3f80c6d92f3a4e6351c87c0f8b4faed473) feat: add llama rerank support - [`5211d41`](https://github.com/ollama/ollama/commit/5211d413cd93d1066e827873b19e8d5f25425c57) Merge branch 'refs/heads/main' into feat-rerank - [`0977d76`](https://github.com/ollama/ollama/commit/0977d769808219c433754e26d1efb743e8496042) feat: add ollama runner rerank support - [`0bff10d`](https://github.com/ollama/ollama/commit/0bff10de4d21930015135385f0d3f14f77d7ac08) feat: update modelfile.mdx for rerank - [`3c5340c`](https://github.com/ollama/ollama/commit/3c5340c3faaa16e2bacf0a0793f80750712a6dbd) remove debug code - [`374f001`](https://github.com/ollama/ollama/commit/374f001edd00b6a65e6a4ce039b0bface15dadb2) fix template execute failed - [`62e0d73`](https://github.com/ollama/ollama/commit/62e0d73f67a3d784d39366508a20912ab9d0f530) Merge branch 'main' into feat-rerank - [`c544d49`](https://github.com/ollama/ollama/commit/c544d49128ca3014338901b1d4f5a8378aa79fae) add rerank interface into mlxrunner - [`e02bcdb`](https://github.com/ollama/ollama/commit/e02bcdbbb736c22280f84f7c6d46b3855f62b1fd) Update llama.cpp to b7591 to fix rerank on CUDA - [`b0414f3`](https://github.com/ollama/ollama/commit/b0414f3c924e9acf448ca3da301490074a5bb396) Merge branch 'main' into feat-rerank ### 📊 Changes **119 files changed** (+5721 additions, -1068 deletions) <details> <summary>View changed files</summary> 📝 `Makefile.sync` (+1 -1) 📝 `api/types.go` (+25 -0) 📝 `convert/convert.go` (+10 -0) 📝 `docs/api.md` (+90 -0) 📝 `docs/modelfile.mdx` (+2 -0) 📝 `docs/template.mdx` (+4 -0) 📝 `fs/config.go` (+2 -0) 📝 `fs/ggml/ggml.go` (+10 -0) 📝 `llama/build-info.cpp` (+1 -1) 📝 `llama/llama.cpp/common/common.cpp` (+24 -24) 📝 `llama/llama.cpp/common/common.h` (+7 -3) 📝 `llama/llama.cpp/common/sampling.cpp` (+51 -37) 📝 `llama/llama.cpp/common/sampling.h` (+6 -3) 📝 `llama/llama.cpp/include/llama.h` (+13 -4) 📝 `llama/llama.cpp/src/llama-adapter.cpp` (+12 -3) 📝 `llama/llama.cpp/src/llama-adapter.h` (+7 -1) 📝 `llama/llama.cpp/src/llama-arch.cpp` (+59 -1) 📝 `llama/llama.cpp/src/llama-arch.h` (+6 -0) 📝 `llama/llama.cpp/src/llama-context.cpp` (+22 -21) 📝 `llama/llama.cpp/src/llama-hparams.h` (+4 -3) _...and 80 more files_ </details> ### 📄 Description Implements reranking: #3368 Continuation of the re-ranking feature implementation based on #11328. I referred to the method used in llama.cpp and implemented the re-ranking functionality in both ollamarunner and llamarunner. This implementation supports re-ranking models based on **Qwen3** and **BERT**. The output is stable when running on CPU. However, when running on GPU, the scores for some documents become NaN, causing the response to fail to return correctly. In llamarunner, this issue even leads to a crash. The results are stable when using the latest version of llama-serve. I also tested with the current version of llama-server used in this project (ec98e2002), during which some scores were returned as null. Upon investigation, I found that this issue was fixed in version b7591 of llama-server. test models: + qllama/bce-reranker-base_v1:q8_0 + dengcao/Qwen3-Reranker-8B:q8_0 + bge-reranker-v2-m3:latest --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 07:17:38 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#19821