[GH-ISSUE #5278] Is it possible to start llama server through dynamic dependency library? #3308

Closed
opened 2026-04-12 13:52:41 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @leeyiding on GitHub (Jun 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5278

Hi, I'm trying to run Ollama in Nanos Unikernel, Unikernel a single-process operating system.

I found that in 58d95cc9bd, the running mode of llama server changed from loading dynamic dependency libraries to running through subprocess. Therefore, before version 0.1.32, I could run Ollama normally in Nanos, but it could not run normally in later versions.

I would like to ask whether it is possible to continue to provide the method of loading llama server as a dynamic dependency library and avoid using subprocess. Perhaps the two methods can coexist and users can choose by themselves.

Looking forward to your reply, thank you very much.

Originally created by @leeyiding on GitHub (Jun 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5278 Hi, I'm trying to run Ollama in [Nanos Unikernel](https://github.com/nanovms/nanos), Unikernel a single-process operating system. I found that in https://github.com/ollama/ollama/commit/58d95cc9bd446a8209e7388a96c70367cbafd653, the running mode of llama server changed from loading dynamic dependency libraries to running through subprocess. Therefore, before version 0.1.32, I could run Ollama normally in Nanos, but it could not run normally in later versions. I would like to ask whether it is possible to continue to provide the method of loading llama server as a dynamic dependency library and avoid using subprocess. Perhaps the two methods can coexist and users can choose by themselves. Looking forward to your reply, thank you very much.
GiteaMirror added the feature request label 2026-04-12 13:52:41 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jul 2, 2024):

Process isolation has helped us improve reliability, and is a cornerstone of the new concurrency support. We are looking at refining how we ingest llama.cpp via PR #5034 however that will still use the subprocess model. It isn't implemented yet, but we've considered a lowest common denominator CPU version linked into the main application (without any AVX* extensions) so it is theoretically possible that could be wired up to skip the subprocess isolation, but inference performance will be slow by comparison the subprocess runners with GPU or vector extension support.

<!-- gh-comment-id:2204730258 --> @dhiltgen commented on GitHub (Jul 2, 2024): Process isolation has helped us improve reliability, and is a cornerstone of the new concurrency support. We are looking at refining how we ingest llama.cpp via PR #5034 however that will still use the subprocess model. It isn't implemented yet, but we've considered a lowest common denominator CPU version linked into the main application (without any AVX* extensions) so it is theoretically possible that could be wired up to skip the subprocess isolation, but inference performance will be slow by comparison the subprocess runners with GPU or vector extension support.
Author
Owner

@leeyiding commented on GitHub (Jul 3, 2024):

Got it

<!-- gh-comment-id:2204940916 --> @leeyiding commented on GitHub (Jul 3, 2024): Got it
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3308