[PR #6428] [MERGED] Runner.go Context Window Shifting #12109

Closed
opened 2026-04-12 23:49:49 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6428
Author: @jessegross
Created: 8/19/2024
Status: Merged
Merged: 8/22/2024
Merged by: @jessegross

Base: jmorganca/llamaHead: jessegross/kvshift


📝 Commits (10+)

  • 0bb656d llm: Fix lint
  • aa47c6f server: Fix double free on runner subprocess error.
  • c2a3eba runner: Initialize numPredict
  • bc8427a llm: Fix array out-of-bounds memory access when tokenizing
  • 5ff30ed runner.go: Fix off by one in batch size check
  • 7d2c527 llama.go: Make batch memory allocation match configuration
  • f3a8a4c llama.go: Use dynamic buffer for TokenToPiece
  • ab5360d llama.go: Advance though tokens when processing multiple batches
  • 3404c78 runner.go: Don't decode if nothing has been added to the batch
  • 21d1ec7 runner.go: Shift context window when KV cache space is exceeded

📊 Changes

6 files changed (+203 additions, -41 deletions)

View changed files

📝 llama/llama.go (+48 -11)
📝 llama/runner/runner.go (+135 -24)
📝 llama/sampling_ext.cpp (+2 -0)
📝 llama/sampling_ext.h (+2 -0)
📝 llm/llm.go (+14 -6)
📝 llm/server.go (+2 -0)

📄 Description

This series implements context window shifting for the new go server runner. It also fixes a number of issues in the related code.

My intention is to start adding tests for some of the issues encountered here but I wanted to start getting reviews on this code in the meantime.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6428 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 8/19/2024 **Status:** ✅ Merged **Merged:** 8/22/2024 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `jmorganca/llama` ← **Head:** `jessegross/kvshift` --- ### 📝 Commits (10+) - [`0bb656d`](https://github.com/ollama/ollama/commit/0bb656df016a4285bbb21b971148d97a4aeedba1) llm: Fix lint - [`aa47c6f`](https://github.com/ollama/ollama/commit/aa47c6f5ed472ac66b3e580ccb46069cf32ea873) server: Fix double free on runner subprocess error. - [`c2a3eba`](https://github.com/ollama/ollama/commit/c2a3eba44f7a8d9ad5ab0688c8a77f73b01244f4) runner: Initialize numPredict - [`bc8427a`](https://github.com/ollama/ollama/commit/bc8427aa2f0e2d2bbf0164bf44a109867e1f180f) llm: Fix array out-of-bounds memory access when tokenizing - [`5ff30ed`](https://github.com/ollama/ollama/commit/5ff30edbea3e10a326ee5ff2b82ac6eb2472ed71) runner.go: Fix off by one in batch size check - [`7d2c527`](https://github.com/ollama/ollama/commit/7d2c52714db73c00213dcb774f760ed48338e7ec) llama.go: Make batch memory allocation match configuration - [`f3a8a4c`](https://github.com/ollama/ollama/commit/f3a8a4c181963bf243c9407911bf0b3d85a3505d) llama.go: Use dynamic buffer for TokenToPiece - [`ab5360d`](https://github.com/ollama/ollama/commit/ab5360d9e7c0ae51441324a35674385c495f49fb) llama.go: Advance though tokens when processing multiple batches - [`3404c78`](https://github.com/ollama/ollama/commit/3404c78f9b412267179e8b738d1f49bfa9b72c7d) runner.go: Don't decode if nothing has been added to the batch - [`21d1ec7`](https://github.com/ollama/ollama/commit/21d1ec74883e7d6f9c88fb9a56591de16025be32) runner.go: Shift context window when KV cache space is exceeded ### 📊 Changes **6 files changed** (+203 additions, -41 deletions) <details> <summary>View changed files</summary> 📝 `llama/llama.go` (+48 -11) 📝 `llama/runner/runner.go` (+135 -24) 📝 `llama/sampling_ext.cpp` (+2 -0) 📝 `llama/sampling_ext.h` (+2 -0) 📝 `llm/llm.go` (+14 -6) 📝 `llm/server.go` (+2 -0) </details> ### 📄 Description This series implements context window shifting for the new go server runner. It also fixes a number of issues in the related code. My intention is to start adding tests for some of the issues encountered here but I wanted to start getting reviews on this code in the meantime. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:49:49 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#12109