[PR #3218] [MERGED] Switch back to subprocessing for llama.cpp #57796

Closed
opened 2026-04-29 12:31:28 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/3218
Author: @dhiltgen
Created: 3/18/2024
Status: Merged
Merged: 4/2/2024
Merged by: @dhiltgen

Base: mainHead: subprocess


📝 Commits (7)

  • 58d95cc Switch back to subprocessing for llama.cpp
  • 0a0e9f3 Apply 01-cache.diff
  • 4fec581 Integration test improvements
  • 10ed1b6 Detect too-old cuda driver
  • 0a74cb3 Safeguard for noexec
  • 526d4eb Release gpu discovery library after use
  • 1f11b52 Refined min memory from testing

📊 Changes

43 files changed (+1511 additions, -1946 deletions)

View changed files

📝 .github/workflows/test.yaml (+24 -18)
📝 .gitignore (+2 -1)
📝 Dockerfile (+16 -9)
📝 app/lifecycle/server.go (+24 -1)
📝 docs/troubleshooting.md (+7 -0)
📝 gpu/amd_linux.go (+17 -0)
📝 gpu/assets.go (+19 -4)
📝 gpu/gpu.go (+18 -12)
📝 gpu/gpu_info_cudart.c (+10 -0)
📝 gpu/gpu_info_cudart.h (+2 -0)
📝 gpu/gpu_info_nvml.c (+7 -0)
📝 gpu/gpu_info_nvml.h (+1 -0)
📝 integration/basic_test.go (+1 -1)
📝 integration/utils_test.go (+33 -18)
llm/dyn_ext_server.c (+0 -142)
llm/dyn_ext_server.go (+0 -388)
llm/dyn_ext_server.h (+0 -74)
📝 llm/ext_server/CMakeLists.txt (+10 -17)
llm/ext_server/README.md (+0 -18)
llm/ext_server/ext_server.cpp (+0 -377)

...and 23 more files

📄 Description

This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.

Tested on Windows, Linux, Mac. Nvidia/AMD, and simulated a number of different failure modes to ensure it detected the runner not responding and restarted it on next request.

Fixes #1691
Fixes #1848
Fixes #1871
Fixes #2767


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/3218 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/18/2024 **Status:** ✅ Merged **Merged:** 4/2/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `subprocess` --- ### 📝 Commits (7) - [`58d95cc`](https://github.com/ollama/ollama/commit/58d95cc9bd446a8209e7388a96c70367cbafd653) Switch back to subprocessing for llama.cpp - [`0a0e9f3`](https://github.com/ollama/ollama/commit/0a0e9f3e0fa30e49c330cc48932c703d2a4d1e7a) Apply 01-cache.diff - [`4fec581`](https://github.com/ollama/ollama/commit/4fec5816d6e6b79b91fc0f61ce1927faafc1017a) Integration test improvements - [`10ed1b6`](https://github.com/ollama/ollama/commit/10ed1b6292d552ecb3a062d7ca334f2dd2f75e17) Detect too-old cuda driver - [`0a74cb3`](https://github.com/ollama/ollama/commit/0a74cb31d5cc7f3ea51c29db3db726b2816411b5) Safeguard for noexec - [`526d4eb`](https://github.com/ollama/ollama/commit/526d4eb2044158f5a25511e6800d8701e2ff2489) Release gpu discovery library after use - [`1f11b52`](https://github.com/ollama/ollama/commit/1f11b525111ea89bf325ad6fda3e3a13d6396a50) Refined min memory from testing ### 📊 Changes **43 files changed** (+1511 additions, -1946 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/test.yaml` (+24 -18) 📝 `.gitignore` (+2 -1) 📝 `Dockerfile` (+16 -9) 📝 `app/lifecycle/server.go` (+24 -1) 📝 `docs/troubleshooting.md` (+7 -0) 📝 `gpu/amd_linux.go` (+17 -0) 📝 `gpu/assets.go` (+19 -4) 📝 `gpu/gpu.go` (+18 -12) 📝 `gpu/gpu_info_cudart.c` (+10 -0) 📝 `gpu/gpu_info_cudart.h` (+2 -0) 📝 `gpu/gpu_info_nvml.c` (+7 -0) 📝 `gpu/gpu_info_nvml.h` (+1 -0) 📝 `integration/basic_test.go` (+1 -1) 📝 `integration/utils_test.go` (+33 -18) ➖ `llm/dyn_ext_server.c` (+0 -142) ➖ `llm/dyn_ext_server.go` (+0 -388) ➖ `llm/dyn_ext_server.h` (+0 -74) 📝 `llm/ext_server/CMakeLists.txt` (+10 -17) ➖ `llm/ext_server/README.md` (+0 -18) ➖ `llm/ext_server/ext_server.cpp` (+0 -377) _...and 23 more files_ </details> ### 📄 Description This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently. Tested on Windows, Linux, Mac. Nvidia/AMD, and simulated a number of different failure modes to ensure it detected the runner not responding and restarted it on next request. Fixes #1691 Fixes #1848 Fixes #1871 Fixes #2767 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 12:31:29 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#57796