[GH-ISSUE #2534] Packaging issues with vendored llama.cpp #47995

Closed
opened 2026-04-28 06:21:04 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @viraptor on GitHub (Feb 16, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2534

Originally assigned to: @dhiltgen on GitHub.

Hi,
I'm trying to package the new version (after llama.cpp has been vendored) for nixpkgs and I'm running into issues. Essentially, ollama tries to be very clever and generic with the build, but this goes counter to what the systems which provide the packaged ollama and llama.cpp will try to achieve.

Since we already have the llama.cpp packages ready with all the the complicated cuda/rocm/apple dependencies and flags in order, it's extra unnecessary work to replicate all of that for ollama as well. While I'm trying to find a good way to un-vendor and use the existing library (with your provided patches), it's getting problematic. Your custom distribution works for you, but I'd love to be able to just build one version with specific config, referencing an existing llama.cpp.

Have you considered upstreaming your changes to llama.cpp? My happy path as a packager would be: ollama depends on llama.cpp, optionally requiring an environment variable to point at a specific shared library.

There are also minor issues in multiple places, like:

Getting all the required functions back into llama.cpp, or at least providing everything as a drop-in folder that can be placed in llama.cpp/examples (so no complex build-time modifications/generation is done in ollama) would be a great improvement. It will probably also save you some headaches in the future when you update llama.cpp.

Originally created by @viraptor on GitHub (Feb 16, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2534 Originally assigned to: @dhiltgen on GitHub. Hi, I'm trying to package the new version (after llama.cpp has been vendored) for nixpkgs and I'm running into issues. Essentially, ollama tries to be very clever and generic with the build, but this goes counter to what the systems which provide the packaged ollama and llama.cpp will try to achieve. Since we already have the llama.cpp packages ready with all the the complicated cuda/rocm/apple dependencies and flags in order, it's extra unnecessary work to replicate all of that for ollama as well. While I'm trying to find a good way to un-vendor and use the existing library (with your provided patches), it's getting problematic. Your custom distribution works for you, but I'd love to be able to just build one version with specific config, referencing an existing llama.cpp. Have you considered upstreaming your changes to llama.cpp? My happy path as a packager would be: ollama depends on llama.cpp, optionally requiring an environment variable to point at a specific shared library. There are also minor issues in multiple places, like: - both cmake and compiler being used directly instead of having a complete cmake build [https://github.com/ollama/ollama/blob/a468ae045971d009b782b259d21869f2767269fa/llm/generate/gen_common.sh#L87](here) - g++ being used instead of `$CXX` which breaks builds on some systems [https://github.com/ollama/ollama/blob/a468ae045971d009b782b259d21869f2767269fa/llm/generate/gen_common.sh#L89](here) Getting all the required functions back into llama.cpp, or at least providing everything as a drop-in folder that can be placed in llama.cpp/examples (so no complex build-time modifications/generation is done in ollama) would be a great improvement. It will probably also save you some headaches in the future when you update llama.cpp.
GiteaMirror added the feature request label 2026-04-28 06:21:04 -05:00
Author
Owner

@viraptor commented on GitHub (Feb 18, 2024):

An alternative idea:

  • Make a proper fork of llama.cpp where you carry your patches on top and rebase for each release. This way the whole patching step can be avoided.
  • Ensure cmake builds all the custom targets directly - without the extra outside step.

This way you could build the ext_server extension directly from that repo and independently from ollama. This would likely be better for your development process as well.

<!-- gh-comment-id:1951477721 --> @viraptor commented on GitHub (Feb 18, 2024): An alternative idea: - Make a proper fork of llama.cpp where you carry your patches on top and rebase for each release. This way the whole patching step can be avoided. - Ensure cmake builds all the custom targets directly - without the extra outside step. This way you could build the ext_server extension directly from that repo and independently from ollama. This would likely be better for your development process as well.
Author
Owner

@dhiltgen commented on GitHub (Feb 20, 2024):

As you pointed out, we carry patches, although in general we try to upstream those. The bigger challenge is we wrap the example server with a thin facade extern "C" interface so we can link to it as a library. Normally, the server is only built as an executable, not library upstream, so we also modify the cmake build to accomplish that. Our patches and wrapper are lighter weight than a fork for now. This is due to the evolution of how we utilize llama.cpp where we used to subprocess to the server as an executable and rely on its higher level logic. Longer term, we may shift to leverage the official upstream extern "C" interfaces in llama.cpp, or we might transition to alternate libraries entirely, like direct CUDA/ROCm/Metal access, or LLM-centric libraries like MLX, TensorRT-LLM, etc. This is a dynamic space, and we're watching how these various projects evolve and adapt to LLM use-cases.

Short term, I'm not sure it's feasible to leverage llama.cpp purely as a pre-compiled library. Longer term it might be possible, or might become moot.

<!-- gh-comment-id:1955063232 --> @dhiltgen commented on GitHub (Feb 20, 2024): As you pointed out, we carry patches, although in general we try to upstream those. The bigger challenge is we wrap the example server with a thin facade `extern "C"` [interface](https://github.com/ollama/ollama/tree/main/llm/ext_server) so we can link to it as a library. Normally, the server is only built as an executable, not library upstream, so we also modify the cmake build to accomplish that. Our patches and wrapper are lighter weight than a fork for now. This is due to the evolution of how we utilize llama.cpp where we used to subprocess to the server as an executable and rely on its higher level logic. Longer term, we may shift to leverage the official upstream `extern "C"` interfaces in llama.cpp, or we might transition to alternate libraries entirely, like direct CUDA/ROCm/Metal access, or LLM-centric libraries like MLX, TensorRT-LLM, etc. This is a dynamic space, and we're watching how these various projects evolve and adapt to LLM use-cases. Short term, I'm not sure it's feasible to leverage llama.cpp purely as a pre-compiled library. Longer term it might be possible, or might become moot.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47995