[GH-ISSUE #12624] Download failures on corporate networks with SSL inspection #8379

Closed
opened 2026-04-12 21:01:28 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ben-sandham on GitHub (Oct 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12624

What is the issue?

Problem Statement

What's happening:
Large model downloads (>1GB) fail with "max retries exceeded: EOF" on corporate networks that use SSL inspection (Palo Alto GlobalProtect, Zscaler, Forcepoint, etc.). Small models (<500MB) download successfully.

Evidence:

  • Ollama downloads models in 16 parallel parts (hardcoded in server/download.go)
  • Corporate SSL inspection appliances intercept HTTPS, decrypt, and re-encrypt traffic
  • 16 simultaneous SSL handshakes + sustained 2GB transfer overwhelms inspection appliances
  • Connections terminate with EOF after ~10 seconds

Why this matters:

  • Affects every enterprise using SSL inspection
  • Blocks Ollama adoption in regulated industries
  • Not fixable by users (corporate security policies mandate SSL inspection)
  • Not fixable by IT (whitelisting external sites often violates security policy)

Current Workarounds (All Problematic)

  1. Disable VPN: Not feasible in many organizations
  2. Request IT whitelist: Does not scale in some orgs (whitelisting is done for groups, then managing membership)
  3. Use different tool (LM Studio): Forces team to diverge from Ollama
  4. Manual blob transfer: Complex, not team-friendly, breaks ollama pull UX

These issues share the same root cause (SSL inspection + parallel downloads):

  • #8167 Logs show all 16 parallel parts failing with EOF; user confirmed corporate proxy was the cause
  • #11587 Requests CA certificate support for TLS-inspecting proxies (doesn't mention parallel downloads connection yet)
  • #1859 Corporate network/proxy timeout issues (SSL inspection implied but not explicitly confirmed)

Unrelated issues that appear similar but have different causes:

  • #7393 (corrupted local manifest)
  • #10050 (Windows DNS settings)

Proposed Solution

Add optional control over download parallelism to handle networks where many simultaneous connections are problematic:

Option A: Environment Variable (Minimal Change)

git --no-pager diff -- server/download.go
diff --git a/server/download.go b/server/download.go
index 784ba2d5..77cb4e25 100644
--- a/server/download.go
+++ b/server/download.go
@@ -94,10 +94,18 @@ func (p *blobDownloadPart) UnmarshalJSON(b []byte) error {
 	return nil
 }
 
+func getNumDownloadParts() int {
+	if parts := os.Getenv("OLLAMA_DOWNLOAD_PARTS"); parts != "" {
+		if n, err := strconv.Atoi(parts); err == nil && n > 0 && n <= 32 {
+			return n
+		}
+	}
+	return 16 // default unchanged
+}
+
 const (
-	numDownloadParts          = 16
-	minDownloadPartSize int64 = 100 * format.MegaByte
-	maxDownloadPartSize int64 = 1000 * format.MegaByte
+	minDownloadPartSize int64 = 100 * 1000 * 1000
+	maxDownloadPartSize int64 = 1000 * 1000 * 1000
 )
 
 func (p *blobDownloadPart) Name() string {
@@ -151,7 +159,8 @@ func (b *blobDownload) Prepare(ctx context.Context, requestURL *url.URL, opts *r
 
 		b.Total, _ = strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64)
 
-		size := b.Total / numDownloadParts
+		numParts := getNumDownloadParts()
+		size := b.Total / int64(numParts)
 		switch {
 		case size < minDownloadPartSize:
 			size = minDownloadPartSize
@@ -271,7 +280,7 @@ func (b *blobDownload) run(ctx context.Context, requestURL *url.URL, opts *regis
 	}
 
 	g, inner := errgroup.WithContext(ctx)
-	g.SetLimit(numDownloadParts)
+	g.SetLimit(getNumDownloadParts())
 	for i := range b.Parts {
 		part := b.Parts[i]
 		if part.Completed.Load() == part.Size {

Usage:

# Corporate networks with SSL inspection
OLLAMA_DOWNLOAD_PARTS=1 ollama pull llama3.2:3b

# Docker Compose
environment:
  - OLLAMA_DOWNLOAD_PARTS=4

Trade-offs:

  • Backward compatible (defaults to current 16)
  • Allows corporate users to download successfully (slow but reliable)
  • Minimal code change (~10 lines)
  • Follows existing OLLAMA_* env var pattern
  • ⚠️ Adds new surface area (env var that can't be removed later)
  • ⚠️ Slower downloads when parallelism reduced (acceptable trade-off for users who can't download at all currently)

Option B: Auto-Detect SSL Inspection (More Complex)

// Detect corporate MITM by checking certificate chain
// Automatically reduce parallelism from 16 → 4 if corporate CA detected
// Requires more code but zero configuration

Option C: Retry with Reduced Parallelism (Middle Ground)

// Start with 16 parts as today
// If all parts fail with EOF, retry with 4 parts
// If still failing, retry with 1 part
// Requires retry logic but no user configuration

How This Will Be Used

Corporate Developer Workflow:

# Dockerfile
FROM ollama/ollama
ENV OLLAMA_DOWNLOAD_PARTS=4

# Models download successfully on corporate network
RUN ollama pull llama3.2:3b

How This Will Be Tested

Test Cases:

  1. Default behavior unchanged: 16 parts for normal networks
  2. OLLAMA_DOWNLOAD_PARTS=1: Single-threaded download succeeds on SSL inspection network
  3. OLLAMA_DOWNLOAD_PARTS=4: Balanced performance/reliability
  4. OLLAMA_DOWNLOAD_PARTS=0 or invalid: Falls back to default 16
  5. Small models (<100MB): Continues to work with fewer parts

Performance Benchmarks:

  • Normal network: 16 parts ~2 min for 2GB model (no change)
  • Corporate network with PARTS=1: ~<TBD> min for 2GB model (slower but succeeds vs. failing completely)
  • Corporate network with PARTS=4: ~<TBD> min for 2GB model (good balance)

Documentation Draft

Environment Variables:

### OLLAMA_DOWNLOAD_PARTS

Number of parallel parts to use when downloading model blobs. Default: `16`

**When to adjust:**

- Corporate networks with SSL inspection: Set to `1` or `4`
- Slow/unstable connections: Reduce to limit concurrent transfers
- Fast dedicated connections: Increase up to `32` for better performance

**Examples:**

```bash
# Single-threaded download for corporate SSL inspection
OLLAMA_DOWNLOAD_PARTS=1 ollama pull llama3.2:3b

# Balanced for restricted networks
OLLAMA_DOWNLOAD_PARTS=4 ollama pull llama3.2:3b
```

Questions for Maintainers

  1. Is Option A (env var) acceptable despite adding surface area? Or would you prefer Option C (auto-retry with backoff)?
  2. Should this be limited to corporate networks, or exposed for all users who have slow/unstable connections?
  3. Alternative approaches I haven't considered that would solve this without adding configuration?
  4. Is there value in auto-detecting SSL inspection (Option B), or is explicit configuration better?

Impact

Quantified benefit:

  • Enables Ollama in enterprises that currently can't use it (millions of developers)
  • Improves download reliability on any network with connection limits

Relevant log output

# Certificate chain shows corporate MITM
$ openssl s_client -connect registry.ollama.ai:443 2>/dev/null | grep issuer
issuer=C=US, O=Example-Corp,
       OU=Network Information Security, CN=Example-Corp-Issuing-CA

# Ollama logs show all 16 parts failing simultaneously
time=2025-10-11T03:51:48.871Z level=INFO source=download.go:177
  msg="downloading dde5aa3fc5ff in 16 126 MB part(s)"
time=2025-10-11T03:51:59.058Z level=INFO source=download.go:295
  msg="dde5aa3fc5ff part 5 attempt 0 failed: EOF, retrying in 1s"
[... all 16 parts fail with EOF ...]

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.12.5

Originally created by @ben-sandham on GitHub (Oct 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12624 ### What is the issue? ## Problem Statement **What's happening:** Large model downloads (>1GB) fail with "max retries exceeded: EOF" on corporate networks that use SSL inspection (Palo Alto GlobalProtect, Zscaler, Forcepoint, etc.). Small models (<500MB) download successfully. **Evidence:** - Ollama downloads models in 16 parallel parts (hardcoded in `server/download.go`) - Corporate SSL inspection appliances intercept HTTPS, decrypt, and re-encrypt traffic - 16 simultaneous SSL handshakes + sustained 2GB transfer overwhelms inspection appliances - Connections terminate with EOF after ~10 seconds **Why this matters:** - Affects every enterprise using SSL inspection - Blocks Ollama adoption in regulated industries - Not fixable by users (corporate security policies mandate SSL inspection) - Not fixable by IT (whitelisting external sites often violates security policy) ## Current Workarounds (All Problematic) 1. **Disable VPN**: Not feasible in many organizations 2. **Request IT whitelist**: Does not scale in some orgs (whitelisting is done for groups, then managing membership) 3. **Use different tool (LM Studio)**: Forces team to diverge from Ollama 5. **Manual blob transfer**: Complex, not team-friendly, breaks `ollama pull` UX ## Related Issues These issues share the same root cause (SSL inspection + parallel downloads): - #8167 Logs show all 16 parallel parts failing with EOF; user confirmed corporate proxy was the cause - #11587 Requests CA certificate support for TLS-inspecting proxies (doesn't mention parallel downloads connection yet) - #1859 Corporate network/proxy timeout issues (SSL inspection implied but not explicitly confirmed) Unrelated issues that appear similar but have different causes: - #7393 (corrupted local manifest) - #10050 (Windows DNS settings) ## Proposed Solution Add optional control over download parallelism to handle networks where many simultaneous connections are problematic: **Option A: Environment Variable (Minimal Change)** ```diff git --no-pager diff -- server/download.go diff --git a/server/download.go b/server/download.go index 784ba2d5..77cb4e25 100644 --- a/server/download.go +++ b/server/download.go @@ -94,10 +94,18 @@ func (p *blobDownloadPart) UnmarshalJSON(b []byte) error { return nil } +func getNumDownloadParts() int { + if parts := os.Getenv("OLLAMA_DOWNLOAD_PARTS"); parts != "" { + if n, err := strconv.Atoi(parts); err == nil && n > 0 && n <= 32 { + return n + } + } + return 16 // default unchanged +} + const ( - numDownloadParts = 16 - minDownloadPartSize int64 = 100 * format.MegaByte - maxDownloadPartSize int64 = 1000 * format.MegaByte + minDownloadPartSize int64 = 100 * 1000 * 1000 + maxDownloadPartSize int64 = 1000 * 1000 * 1000 ) func (p *blobDownloadPart) Name() string { @@ -151,7 +159,8 @@ func (b *blobDownload) Prepare(ctx context.Context, requestURL *url.URL, opts *r b.Total, _ = strconv.ParseInt(resp.Header.Get("Content-Length"), 10, 64) - size := b.Total / numDownloadParts + numParts := getNumDownloadParts() + size := b.Total / int64(numParts) switch { case size < minDownloadPartSize: size = minDownloadPartSize @@ -271,7 +280,7 @@ func (b *blobDownload) run(ctx context.Context, requestURL *url.URL, opts *regis } g, inner := errgroup.WithContext(ctx) - g.SetLimit(numDownloadParts) + g.SetLimit(getNumDownloadParts()) for i := range b.Parts { part := b.Parts[i] if part.Completed.Load() == part.Size { ``` **Usage:** ```bash # Corporate networks with SSL inspection OLLAMA_DOWNLOAD_PARTS=1 ollama pull llama3.2:3b # Docker Compose environment: - OLLAMA_DOWNLOAD_PARTS=4 ``` **Trade-offs:** - ✅ Backward compatible (defaults to current 16) - ✅ Allows corporate users to download successfully (slow but reliable) - ✅ Minimal code change (~10 lines) - ✅ Follows existing `OLLAMA_*` env var pattern - ⚠️ Adds new surface area (env var that can't be removed later) - ⚠️ Slower downloads when parallelism reduced (acceptable trade-off for users who can't download at all currently) **Option B: Auto-Detect SSL Inspection (More Complex)** ```go // Detect corporate MITM by checking certificate chain // Automatically reduce parallelism from 16 → 4 if corporate CA detected // Requires more code but zero configuration ``` **Option C: Retry with Reduced Parallelism (Middle Ground)** ```go // Start with 16 parts as today // If all parts fail with EOF, retry with 4 parts // If still failing, retry with 1 part // Requires retry logic but no user configuration ``` ## How This Will Be Used **Corporate Developer Workflow:** ```dockerfile # Dockerfile FROM ollama/ollama ENV OLLAMA_DOWNLOAD_PARTS=4 # Models download successfully on corporate network RUN ollama pull llama3.2:3b ``` ## How This Will Be Tested **Test Cases:** 1. Default behavior unchanged: 16 parts for normal networks 2. `OLLAMA_DOWNLOAD_PARTS=1`: Single-threaded download succeeds on SSL inspection network 3. `OLLAMA_DOWNLOAD_PARTS=4`: Balanced performance/reliability 4. `OLLAMA_DOWNLOAD_PARTS=0` or invalid: Falls back to default 16 5. Small models (<100MB): Continues to work with fewer parts **Performance Benchmarks:** - Normal network: 16 parts ~2 min for 2GB model (no change) - Corporate network with PARTS=1: ~\<TBD\> min for 2GB model (slower but succeeds vs. failing completely) - Corporate network with PARTS=4: ~\<TBD\> min for 2GB model (good balance) ## Documentation Draft **Environment Variables:** ````markdown ### OLLAMA_DOWNLOAD_PARTS Number of parallel parts to use when downloading model blobs. Default: `16` **When to adjust:** - Corporate networks with SSL inspection: Set to `1` or `4` - Slow/unstable connections: Reduce to limit concurrent transfers - Fast dedicated connections: Increase up to `32` for better performance **Examples:** ```bash # Single-threaded download for corporate SSL inspection OLLAMA_DOWNLOAD_PARTS=1 ollama pull llama3.2:3b # Balanced for restricted networks OLLAMA_DOWNLOAD_PARTS=4 ollama pull llama3.2:3b ``` ```` ## Questions for Maintainers 1. **Is Option A (env var) acceptable despite adding surface area?** Or would you prefer Option C (auto-retry with backoff)? 2. **Should this be limited to corporate networks**, or exposed for all users who have slow/unstable connections? 3. **Alternative approaches** I haven't considered that would solve this without adding configuration? 4. **Is there value in auto-detecting SSL inspection** (Option B), or is explicit configuration better? ## Impact **Quantified benefit:** - Enables Ollama in enterprises that currently can't use it (millions of developers) - Improves download reliability on any network with connection limits ### Relevant log output ```shell # Certificate chain shows corporate MITM $ openssl s_client -connect registry.ollama.ai:443 2>/dev/null | grep issuer issuer=C=US, O=Example-Corp, OU=Network Information Security, CN=Example-Corp-Issuing-CA # Ollama logs show all 16 parts failing simultaneously time=2025-10-11T03:51:48.871Z level=INFO source=download.go:177 msg="downloading dde5aa3fc5ff in 16 126 MB part(s)" time=2025-10-11T03:51:59.058Z level=INFO source=download.go:295 msg="dde5aa3fc5ff part 5 attempt 0 failed: EOF, retrying in 1s" [... all 16 parts fail with EOF ...] ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.12.5
GiteaMirror added the bug label 2026-04-12 21:01:28 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 15, 2025):

Or use the existing experimental client with OLLAMA_EXPERIMENT=client2 and OLLAMA_REGISTRY_MAXSTREAMS=1.

<!-- gh-comment-id:3404288814 --> @rick-github commented on GitHub (Oct 15, 2025): Or use the existing experimental client with `OLLAMA_EXPERIMENT=client2` and `OLLAMA_REGISTRY_MAXSTREAMS=1`.
Author
Owner

@jannik-el commented on GitHub (Oct 17, 2025):

+1 on this if you are using a cloud service (which most are also doing) this is super necessary, @ me if you open a pull request for this I will absolutely contribute

<!-- gh-comment-id:3414676635 --> @jannik-el commented on GitHub (Oct 17, 2025): +1 on this if you are using a cloud service (which most are also doing) this is super necessary, @ me if you open a pull request for this I will absolutely contribute
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8379