[GH-ISSUE #7419] Integrating Into Desktop App #4718

Closed
opened 2026-04-12 15:39:39 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @brian-at-pieces on GitHub (Oct 30, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7419

I'd love to use Ollama for serving LLMs in my company's Mac/Linux/Windows desktop app, but I'm a little confused about some things.

I'd like to integrate it directly rather than requring the user to manually install Ollama because that UX isn't very good IMO. You mention in the Windows docs that this can be done by just utilizing the Windows release zip, but at 3.25 GB that's way too large to bundle with our app. It looks like it's so big because (1) different runners are included to allow Ollama to choose the optimal one, and (2) AMD and NVIDIA GPU libs are included.

I understand including different runners, but I don't understand why GPU libs like cudart, cublas, rocblas, etc. are included - shouldn't these be present on the system if the proper drivers are installed? If it's not possible to remove these, then it would be nice if you could generate separate releases for each runner / gpu libs to somewhat reduce build size.

That being said, thanks for making an awesome product!

Originally created by @brian-at-pieces on GitHub (Oct 30, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7419 I'd love to use Ollama for serving LLMs in my company's Mac/Linux/Windows desktop app, but I'm a little confused about some things. I'd like to integrate it directly rather than requring the user to manually install Ollama because that UX isn't very good IMO. You [mention in the Windows docs](https://github.com/ollama/ollama/blob/main/docs/windows.md#standalone-cli:~:text=If%20you%27d%20like%20to%20install%20or%20integrate%20Ollama%20as%20a%20service%2C%20a%20standalone%20ollama%2Dwindows%2Damd64.zip%20zip%20file%20is%20available%20containing%20only%20the%20Ollama%20CLI%20and%20GPU%20library%20dependencies%20for%20Nvidia%20and%20AMD.%20This%20allows%20for%20embedding%20Ollama%20in%20existing%20applications) that this can be done by just utilizing the Windows release zip, but at 3.25 GB that's way too large to bundle with our app. It looks like it's so big because (1) different runners are included to allow Ollama to choose the optimal one, and (2) AMD and NVIDIA GPU libs are included. I understand including different runners, but I don't understand why GPU libs like cudart, cublas, rocblas, etc. are included - shouldn't these be present on the system if the proper drivers are installed? If it's not possible to remove these, then it would be nice if you could generate separate releases for each runner / gpu libs to somewhat reduce build size. That being said, thanks for making an awesome product!
GiteaMirror added the feature request label 2026-04-12 15:39:39 -05:00
Author
Owner

@dhiltgen commented on GitHub (Nov 1, 2024):

The reason we include the GPU libraries is to ensure the exact same version is used at runtime which we compiled with. The GPU compiler tools nvcc/hipcc compile optimized binaries that may bake in library version specific assumptions that change over time, so while some drift of library versions might work, it's risky and may lead to crashing. They do have a forwards/backwards compatibility matrix, but we've opted to keep things simple and always bundle the exact version to ensure success.

If you're setting up your own packaging model and are OK taking on the burden of calculating the compatibility matrix for the GPU libraries, you could omit them but you'll still need some mechanism to ensure the client system has a compatible library somehow.

<!-- gh-comment-id:2452128681 --> @dhiltgen commented on GitHub (Nov 1, 2024): The reason we include the GPU libraries is to ensure the exact same version is used at runtime which we compiled with. The GPU compiler tools nvcc/hipcc compile optimized binaries that may bake in library version specific assumptions that change over time, so while some drift of library versions might work, it's risky and may lead to crashing. They do have a forwards/backwards compatibility matrix, but we've opted to keep things simple and always bundle the exact version to ensure success. If you're setting up your own packaging model and are OK taking on the burden of calculating the compatibility matrix for the GPU libraries, you could omit them but you'll still need some mechanism to ensure the client system has a compatible library somehow.
Author
Owner

@brian-at-pieces commented on GitHub (Nov 1, 2024):

Ah okay makes sense. Thanks!

<!-- gh-comment-id:2452422680 --> @brian-at-pieces commented on GitHub (Nov 1, 2024): Ah okay makes sense. Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4718