[GH-ISSUE #9504] Request modular updates to reduce download size (decouple CUDA libraries from core updates) #31952

Open
opened 2026-04-22 12:46:55 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ice6 on GitHub (Mar 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9504

First, thank you for the amazing work on Ollama! 🚀

I'd like to suggest an optimization for the update mechanism. The current package size (~700MB) creates significant bandwidth consumption during updates, especially when only the core binary (~30MB) actually needs updating in many cases. The primary contributors to the package size appear to be CUDA libraries that get re-downloaded with every update regardless of version changes.

Proposed Solution:

  1. Modular Components
    Separate CUDA-related libraries from the core binary in the distribution package.

  2. Incremental Updates

    • Check existing local CUDA library versions during updates
    • Only download new components when actually required (version mismatch/new dependencies)
  3. Optional CUDA Distribution
    Provide CUDA libraries as:

    • Optional downloadable components
    • On-demand installation via ollama install-cuda [version]
    • Separate checksum-verified packages

Benefits:

  • Reduce typical update size by ~95% (from 700MB → ~30MB)
  • Save bandwidth for users with limited connections
  • Allow enterprise users to maintain internal CUDA library mirrors
  • Faster update/rollback operations
Originally created by @ice6 on GitHub (Mar 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9504 First, thank you for the amazing work on Ollama! 🚀 I'd like to suggest an optimization for the update mechanism. The current package size (~700MB) creates significant bandwidth consumption during updates, especially when only the core binary (~30MB) actually needs updating in many cases. The primary contributors to the package size appear to be CUDA libraries that get re-downloaded with every update regardless of version changes. **Proposed Solution:** 1. **Modular Components** Separate CUDA-related libraries from the core binary in the distribution package. 2. **Incremental Updates** - Check existing local CUDA library versions during updates - Only download new components when actually required (version mismatch/new dependencies) 3. **Optional CUDA Distribution** Provide CUDA libraries as: - Optional downloadable components - On-demand installation via `ollama install-cuda [version]` - Separate checksum-verified packages **Benefits:** - Reduce typical update size by ~95% (from 700MB → ~30MB) - Save bandwidth for users with limited connections - Allow enterprise users to maintain internal CUDA library mirrors - Faster update/rollback operations
GiteaMirror added the feature request label 2026-04-22 12:46:55 -05:00
Author
Owner

@L3P3 commented on GitHub (Mar 18, 2025):

This is important, it hurts big time to download almost 2 GB for a machine that does not support CUDA anyways. 😢

<!-- gh-comment-id:2733500518 --> @L3P3 commented on GitHub (Mar 18, 2025): This is important, it hurts big time to download almost 2 GB for a machine that does not support CUDA anyways. 😢
Author
Owner

@ice6 commented on GitHub (Mar 18, 2025):

maybe build the code in the local machine can reduce the pain of current updating mechanism. 😆

https://github.com/ollama/ollama/blob/main/docs/development.md

<!-- gh-comment-id:2734940095 --> @ice6 commented on GitHub (Mar 18, 2025): maybe build the code in the local machine can reduce the pain of current updating mechanism. 😆 https://github.com/ollama/ollama/blob/main/docs/development.md
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#31952