[PR #7406] [CLOSED] Feature/reranker #17680

Closed
opened 2026-04-16 06:10:51 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/7406
Author: @hughescr
Created: 10/28/2024
Status: Closed

Base: mainHead: feature/reranker


📝 Commits (9)

  • 93688ef Update to latest llama.cpp and fix all vendor patches and sampling_ext
  • c9897b0 Implement /api/rerank in go server
  • 9d481c4 Add debug output to see what is being passed; fix prompt concatenation
  • 46df792 Add --reranking flag for runner and update server.loadModel() to use it when appropriate.
  • 9ad00e8 This is not ideal - if llama_get_embeddings_ith randomly fails to find the embeddings then we will drop this document. Would be better to retry, but that could go on failing. Better would be to figure ou the underlying bug -- why is llama_get_embeddings_ith sometimes failing?
  • f6ad996 When reranking, only copy a single embedding element out, because there is only one.
  • de62ee9 Trim whitespace that Decode likes to stick in there.
  • e4f7236 Copy embeddings out of C memory space, otherwise they get corrupted.
  • f70833e Finally got it working. The problem was calling LlamaServer.Embedding in parallel from a go coroutine. When doing that, the embeddings randomly sometimes come back as [], the documents get swapped around and assigned random scores, it's just chaos. No idea why, but it just doesn't work. Removing the coroutine wrapper and just calling sequentially works 100% reliably.

📊 Changes

259 files changed (+6682 additions, -3279 deletions)

View changed files

📝 api/types.go (+27 -0)
📝 llama/clip.cpp (+1 -1)
📝 llama/clip.h (+1 -1)
📝 llama/common.cpp (+113 -76)
📝 llama/common.h (+132 -87)
📝 llama/ggml-aarch64.c (+1 -1)
📝 llama/ggml-aarch64.h (+1 -1)
📝 llama/ggml-alloc.c (+18 -20)
📝 llama/ggml-alloc.h (+2 -2)
📝 llama/ggml-backend-impl.h (+152 -79)
📝 llama/ggml-backend.cpp (+622 -289)
📝 llama/ggml-backend.h (+148 -63)
📝 llama/ggml-blas.cpp (+219 -63)
📝 llama/ggml-blas.h (+6 -4)
📝 llama/ggml-common.h (+1 -1)
📝 llama/ggml-cpu-impl.h (+1 -1)
📝 llama/ggml-cuda.cu (+408 -190)
📝 llama/ggml-cuda.h (+17 -17)
📝 llama/ggml-cuda/acc.cu (+1 -1)
📝 llama/ggml-cuda/acc.cuh (+1 -1)

...and 80 more files

📄 Description

Implement re-ranking by calling into runner's embedding implementation. Uses latest HEAD of llama.cpp and updates all the vendor patches. Tested on MacOS only, I don't have access to a CUDA/other to make sure that all the updated vendor patches are correct.... it's a little but painful to compare them, but I think I did them right. Haven't worked with manually managing patch stacks within version control for over a decade 🤢


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/7406 **Author:** [@hughescr](https://github.com/hughescr) **Created:** 10/28/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `feature/reranker` --- ### 📝 Commits (9) - [`93688ef`](https://github.com/ollama/ollama/commit/93688efa785ec17b3ce62095409169701730a417) Update to latest llama.cpp and fix all vendor patches and sampling_ext - [`c9897b0`](https://github.com/ollama/ollama/commit/c9897b01df3558a09ffa3d77066a084618044b95) Implement /api/rerank in go server - [`9d481c4`](https://github.com/ollama/ollama/commit/9d481c4cb7026c74d7b4544cddd7ce4b31db97d8) Add debug output to see what is being passed; fix prompt concatenation - [`46df792`](https://github.com/ollama/ollama/commit/46df79213965a660c14cb94406c11af79b5ded26) Add `--reranking` flag for runner and update `server.loadModel()` to use it when appropriate. - [`9ad00e8`](https://github.com/ollama/ollama/commit/9ad00e82ab9fabab093fe1c877942b9d0d44eb04) This is not ideal - if llama_get_embeddings_ith randomly fails to find the embeddings then we will drop this document. Would be better to retry, but that could go on failing. Better would be to figure ou the underlying bug -- why is llama_get_embeddings_ith sometimes failing? - [`f6ad996`](https://github.com/ollama/ollama/commit/f6ad996f11910354e304b861c750df5ccb0a349b) When reranking, only copy a single embedding element out, because there is only one. - [`de62ee9`](https://github.com/ollama/ollama/commit/de62ee96e0c2b4ac0dfd4e2b64034dae243c6c1d) Trim whitespace that Decode likes to stick in there. - [`e4f7236`](https://github.com/ollama/ollama/commit/e4f72366039b8aaa4fd2a3ff9b37394f90010697) Copy embeddings out of C memory space, otherwise they get corrupted. - [`f70833e`](https://github.com/ollama/ollama/commit/f70833ef32ce62ed17a7f72df6153004f5955d47) Finally got it working. The problem was calling `LlamaServer.Embedding` in parallel from a go coroutine. When doing that, the embeddings randomly sometimes come back as `[]`, the documents get swapped around and assigned random scores, it's just chaos. No idea why, but it just doesn't work. Removing the coroutine wrapper and just calling sequentially works 100% reliably. ### 📊 Changes **259 files changed** (+6682 additions, -3279 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+27 -0) 📝 `llama/clip.cpp` (+1 -1) 📝 `llama/clip.h` (+1 -1) 📝 `llama/common.cpp` (+113 -76) 📝 `llama/common.h` (+132 -87) 📝 `llama/ggml-aarch64.c` (+1 -1) 📝 `llama/ggml-aarch64.h` (+1 -1) 📝 `llama/ggml-alloc.c` (+18 -20) 📝 `llama/ggml-alloc.h` (+2 -2) 📝 `llama/ggml-backend-impl.h` (+152 -79) 📝 `llama/ggml-backend.cpp` (+622 -289) 📝 `llama/ggml-backend.h` (+148 -63) 📝 `llama/ggml-blas.cpp` (+219 -63) 📝 `llama/ggml-blas.h` (+6 -4) 📝 `llama/ggml-common.h` (+1 -1) 📝 `llama/ggml-cpu-impl.h` (+1 -1) 📝 `llama/ggml-cuda.cu` (+408 -190) 📝 `llama/ggml-cuda.h` (+17 -17) 📝 `llama/ggml-cuda/acc.cu` (+1 -1) 📝 `llama/ggml-cuda/acc.cuh` (+1 -1) _...and 80 more files_ </details> ### 📄 Description Implement re-ranking by calling into runner's embedding implementation. Uses latest HEAD of llama.cpp and updates all the vendor patches. Tested on MacOS only, I don't have access to a CUDA/other to make sure that all the updated vendor patches are correct.... it's a little but painful to compare them, but I *think* I did them right. Haven't worked with manually managing patch stacks within version control for over a decade 🤢 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:10:51 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#17680